Samu Onkalo [Mon, 24 May 2010 21:33:36 +0000 (14:33 -0700)]
lis3: add skeletons for interrupt handlers
Original lis3 driver didn't provide interrupt handler(s) for click or
threshold event handling. This patch adds threaded handlers for one or
two interrupt lines for 8 bit device. Actual content for interrupt
handling is provided in the separate patch.
Signed-off-by: Samu Onkalo <samu.p.onkalo@nokia.com>
Tested-by: Daniel Mack <daniel@caiaq.de>
Acked-by: Eric Piel <eric.piel@tremplin-utc.net>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Samu Onkalo [Mon, 24 May 2010 21:33:35 +0000 (14:33 -0700)]
lis3: introduce platform data for second ff / wu unit
8 bit device has two wakeup / free fall units. It was not possible to
configure the second unit. This patch introduces configuration entry to
the platform data and also corresponding changes to the 8 bit setup
function.
High pass filters were enabled by default. Patch introduces configuration
option for high pass filter cut off frequency and also possibility to
disable or enable the filter via platform data. Since the control is a
new one and default state was filter enabled, new option is used to
disable the filter. This way old platform data is still compatible with
the change.
Signed-off-by: Samu Onkalo <samu.p.onkalo@nokia.com>
Acked-by: Eric Piel <eric.piel@tremplin-utc.net>
Tested-by: Daniel Mack <daniel@caiaq.de>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Samu Onkalo [Mon, 24 May 2010 21:33:34 +0000 (14:33 -0700)]
lis3: separate configuration function for 8 bit device
Separate configuration function for 8 bit version of the chip. This way
generic part of the init function stays little bit more readable.
Signed-off-by: Samu Onkalo <samu.p.onkalo@nokia.com>
Acked-by: Eric Piel <eric.piel@tremplin-utc.net>
Tested-by: Daniel Mack <daniel@caiaq.de>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Samu Onkalo [Mon, 24 May 2010 21:33:32 +0000 (14:33 -0700)]
lis3: add missing constants for 8bit device
Definitions for click were missing.
Signed-off-by: Samu Onkalo <samu.p.onkalo@nokia.com>
Acked-by: Eric Piel <eric.piel@tremplin-utc.net>
Tested-by: Daniel Mack <daniel@caiaq.de>
Cc: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joakim Tjernlund [Mon, 24 May 2010 21:33:31 +0000 (14:33 -0700)]
crc32: use __BYTE_ORDER macro for endian detection.
Since crc32.c contains a nifty test program that can be executed in user
space, make sure endian detection works reliably in user space too.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joakim Tjernlund [Mon, 24 May 2010 21:33:31 +0000 (14:33 -0700)]
crc32: major optimization
Precompute more crc32 values(0xcc00, 0xcc0000 and 0xcc000000) into tables.
This increases the table size from 1KB to 4KB but the performance benfit
makes it worth it:
28% faster on MPC8321, 266 MHz
2x faster on Core 2 Duo, 3.1GHz
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Tobias Klauser [Mon, 24 May 2010 21:33:30 +0000 (14:33 -0700)]
checkpatch: warn on declaration with storage class not at the beginning
The C99 specification states in section 6.11.5:
The placement of a storage-class specifier other than at the beginning
of the declaration specifiers in a declaration is an obsolescent
feature.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch>
Acked-by: Jean Delvare <khali@linux-fr.org>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andi Kleen [Mon, 24 May 2010 21:33:29 +0000 (14:33 -0700)]
checkpatch: add check for too short Kconfig descriptions
I've seen various new Kconfigs with rather unhelpful one liner
descriptions. Add a Kconfig warning for a minimum length of the Kconfig
help section.
Right now I arbitarily chose 4. The exact value can be debated.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Cc: Andy Whitcroft <apw@shadowen.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Shevchenko [Mon, 24 May 2010 21:33:28 +0000 (14:33 -0700)]
drivers: acpi: don't use own implementation of hex_to_bin()
Remove own implementation of hex_to_bin().
Signed-off-by: Andy Shevchenko <ext-andriy.shevchenko@nokia.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Shevchenko [Mon, 24 May 2010 21:33:28 +0000 (14:33 -0700)]
drivers: wireless: use new hex_to_bin() method
Instead of using own implementation involve hex_to_bin() function.
Signed-off-by: Andy Shevchenko <ext-andriy.shevchenko@nokia.com>
Acked-by: John W. Linville <linville@tuxdriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Shevchenko [Mon, 24 May 2010 21:33:27 +0000 (14:33 -0700)]
fs: ldm: don't use own implementation of hex_to_bin()
Remove own implementation of hex_to_bin().
Signed-off-by: Andy Shevchenko <ext-andriy.shevchenko@nokia.com>
Cc: "Richard Russon (FlatCap)" <ldm@flatcap.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Shevchenko [Mon, 24 May 2010 21:33:27 +0000 (14:33 -0700)]
staging: rt2860: use new hex_to_bin() method
Instead of using own implementation involve hex_to_bin() function.
Signed-off-by: Andy Shevchenko <ext-andriy.shevchenko@nokia.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Shevchenko [Mon, 24 May 2010 21:33:26 +0000 (14:33 -0700)]
sysctl: don't use own implementation of hex_to_bin()
Remove own implementation of hex_to_bin().
Signed-off-by: Andy Shevchenko <ext-andriy.shevchenko@nokia.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Shevchenko [Mon, 24 May 2010 21:33:25 +0000 (14:33 -0700)]
usb: atm: speedtch: use new hex_to_bin() method
Instead of using own implementation which potentialy has bugs involve
hex_to_bin() function. It requires to have hex_to_bin() implementation
introduced by starter patch in series.
Signed-off-by: Andy Shevchenko <ext-andriy.shevchenko@nokia.com>
Cc: Duncan Sands <duncan.sands@free.fr>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Shevchenko [Mon, 24 May 2010 21:33:25 +0000 (14:33 -0700)]
drivers: isdn: use new hex_to_bin() method
Remove own implementation of hex_to_bin().
Signed-off-by: Andy Shevchenko <ext-andriy.shevchenko@nokia.com>
Acked-by: Tilman Schmidt <tilman@imap.cc>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Shevchenko [Mon, 24 May 2010 21:33:23 +0000 (14:33 -0700)]
lib: introduce common method to convert hex digits
hex_to_bin() is a little method which converts hex digit to its actual
value. There are plenty of places where such functionality is needed.
[akpm@linux-foundation.org: use tolower(), saving 3 bytes, test the more common case first - it's quicker]
[akpm@linux-foundation.org: relocate tolower to make it even faster! (Joe)]
Signed-off-by: Andy Shevchenko <ext-andriy.shevchenko@nokia.com>
Cc: Tilman Schmidt <tilman@imap.cc>
Cc: Duncan Sands <duncan.sands@free.fr>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Greg Kroah-Hartman <gregkh@suse.de>
Cc: "Richard Russon (FlatCap)" <ldm@flatcap.org>
Cc: John W. Linville <linville@tuxdriver.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Mon, 24 May 2010 21:33:22 +0000 (14:33 -0700)]
lib/hexdump.c: reduce stack variable size and cleanups
Reduce char linebuf[200] to the actual size required., which is 32 * 3 + 2
+ 32 + 1, ie: linebuf[131].
Change examples to use bool true not int 1.
Align multiline argument indentation to open parenthesis.
Use temporary for ptr[j] so trigraph fits on single line.
Convert printk ptr from %*p, (int)(2 * sizeof(void *)) to %p as %p uses
the same calculation for size.
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Florian Ragwitz [Mon, 24 May 2010 21:33:21 +0000 (14:33 -0700)]
DYNAMIC_DEBUG: fix documentation errors
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Florian Ragwitz <rafl@debian.org>
Cc: Jason Baron <jbaron@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dan Carpenter [Mon, 24 May 2010 21:33:21 +0000 (14:33 -0700)]
dynamic_debug: small cleanup in ddebug_proc_write()
This doesn't change behavior at all. In the original code, if nwords was
zero then ddebug_parse_query() would return -EINVAL, now we just do it
earlier.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Jason Baron <jbaron@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Florian Mickler [Mon, 24 May 2010 21:33:20 +0000 (14:33 -0700)]
scripts/get_maintainer.pl: default to not include unspecified tags
This changes the default of the option --git-all-signature-types to be
disabled by default.
The effect being, that only certain (currently Signed-Off-By:, Acked-by:
and Reviewed-By:) tags are used to get adresses of potential maintainers.
Motivated is this change by the desire to not 'spam' people unnecessary: A
Tested-By or a Reported-By is not ment as a hint that those people want
to/are able to review patches to the code in question.
In a quest to find resilient statistics for this i came up with this:
I produced a list of all the tag-signers not already covered with a
signed-off/acked/reviewed tag somewhere in the last year of git history.
Those were 650 addresses of "assumed non-developers".
And to check if those "assumed non-developers" are professional
testers/reporters worth cc'ing, i then counted their total appearences
in the git log:
469 were mentioned only once.
123 were mentioned twice.
38 three times
8 four times
5 six times
5 five times
1 eight times
1 fourteen times
I believe this supports my thesis, that the ''non-maintainer-tags'' are
not actively useful for patch-review. (except probably the guy mentioned
fourteen times...)
But of course one could also find arguments to poke holes in this
statistics, for example does this statistic not include code-locality: A
tested-by on a patch that touches some specific piece of code can be more
worth than a signed-off in another part of the tree.
But... let's play it safe and let's err on the "safe" side meaning to not
spam those people when in doubt. We already have the signed-off's and
Maintainers file. So this should be ok. And if need be, the maintainers
can always forward the patch.
[i probably could make a diploma thesis out of this changelog :)]
Signed-off-by: Florian Mickler <florian@mickler.org>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Joe Perches <joe@perches.com>
Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Mon, 24 May 2010 21:33:19 +0000 (14:33 -0700)]
scripts/get_maintainer.pl: add .get_maintainer.conf default options file
Allow the use of a .get_maintainer.conf file to control the default
options applied when scripts/get_maintainer.pl is run.
.get_maintainer.conf can contain any valid command-line argument.
File contents are prepended to any additional command line arguments.
Multiple lines may be used, blank lines ignored, # is a comment.
Updated scripts/get_maintainer.pl version to 0.24
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Florian Mickler <florian@mickler.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Mon, 24 May 2010 21:33:17 +0000 (14:33 -0700)]
scripts/get_maintainer.pl: optionally ignore non-maintainer signatures
When using --git to determine who to send a patch to, get_maintainers will
currently include all signatures. This can include signers that simply
report an issue or test a patch. Signers that use this tag are not
necessarily good candidates to review new patches.
This patch allows get_maintainers to optionally restrict output to only
signatures that use Signed-off-by:, Acked-by:, or Reviewed-by:.
Signed-off-by: is included because those are people who are responsible
for the code.
Acked-by: is questionable, but as signers that use this tag tend to be
active linux gatekeepers, false positives are tolerable.
Reviewed-by: is included because signers responsible for the code thought
that the review feedback for a changeset by that signer was valuable.
This patch has been modified from Florian's original submission to change
the supported signature types to the canonical forms and use slightly
different spacing. A couple of spacing issues were also corrected in the
original source. The command line argument was also renamed.
Original-patch-by: Florian Mickler <florian@mickler.org>
Signed-off-by: Florian Mickler <florian@mickler.org>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Mon, 24 May 2010 21:33:16 +0000 (14:33 -0700)]
vsprintf.c: use noinline_for_stack
Mark static functions with noinline_for_stack
Before:
akpm:/usr/src/25> objdump -d lib/vsprintf.o | perl scripts/checkstack.pl
0x00000e82 pointer [vsprintf.o]: 344
0x0000198c pointer [vsprintf.o]: 344
0x000025d6 scnprintf [vsprintf.o]: 216
0x00002648 scnprintf [vsprintf.o]: 216
0x00002565 snprintf [vsprintf.o]: 208
0x0000267c sprintf [vsprintf.o]: 208
0x000030a3 bprintf [vsprintf.o]: 208
0x00003b1e sscanf [vsprintf.o]: 208
0x00000608 number [vsprintf.o]: 136
0x00000937 number [vsprintf.o]: 136
After:
akpm:/usr/src/25> objdump -d lib/vsprintf.o | perl scripts/checkstack.pl
0x00000a7c symbol_string [vsprintf.o]: 248
0x00000ae8 symbol_string [vsprintf.o]: 248
0x00002310 scnprintf [vsprintf.o]: 216
0x00002382 scnprintf [vsprintf.o]: 216
0x0000229f snprintf [vsprintf.o]: 208
0x000023b6 sprintf [vsprintf.o]: 208
0x00002ddd bprintf [vsprintf.o]: 208
0x00003858 sscanf [vsprintf.o]: 208
0x00000625 number [vsprintf.o]: 136
0x00000954 number [vsprintf.o]: 136
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michael Hennerich [Mon, 24 May 2010 21:33:16 +0000 (14:33 -0700)]
ad525x_dpot: add support for one time programmable pots
New parts supported:
AD5170, AD5171, AD5172, AD5173, AD5273
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michael Hennerich [Mon, 24 May 2010 21:33:15 +0000 (14:33 -0700)]
ad525x_dpot: add support for ADN2860 and AD528x pots
New parts supported:
AD5280, AD5282, ADN2860
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michael Hennerich [Mon, 24 May 2010 21:33:15 +0000 (14:33 -0700)]
ad525x_dpot: add support for AD524x pots
New parts supported:
AD5241, AD5242, AD5243, AD5245, AD5246, AD5247, AD5248
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michael Hennerich [Mon, 24 May 2010 21:33:14 +0000 (14:33 -0700)]
ad525x_dpot: add support for SPI parts
Split the bus logic out into separate files so that we can handle I2C and
SPI busses independently. The new SPI bus logic brings in support for a
lot more parts:
AD5160, AD5161, AD5162, AD5165, AD5200, AD5201, AD5203,
AD5204, AD5206, AD5207, AD5231, AD5232, AD5233, AD5235,
AD5260, AD5262, AD5263, AD5290, AD5291, AD5292, AD5293,
AD7376, AD8400, AD8402, AD8403, ADN2850
[randy.dunlap@oracle.com: fix ad525X_dpot build]
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michael Hennerich [Mon, 24 May 2010 21:33:13 +0000 (14:33 -0700)]
ad525x_dpot: extend write argument to 16bits
The possible output data is 16bits, not 8bits, so don't truncate it.
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michael Hennerich [Mon, 24 May 2010 21:33:13 +0000 (14:33 -0700)]
ad525x_dpot: simplify duplicated sysfs defines
Macro away the duplication to make maintenance easier.
Signed-off-by: Michael Hennerich <michael.hennerich@analog.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wenji Huang [Mon, 24 May 2010 21:33:12 +0000 (14:33 -0700)]
module: remove duplicate declaration of __ksymtab_gpl_future
Minor cleanup on duplicate __{start/stop}__ksymtab_gpl_future.
Signed-off-by: Wenji Huang <wenji.huang@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
OGAWA Hirofumi [Mon, 24 May 2010 21:33:12 +0000 (14:33 -0700)]
fatfs: ratelimit corruption report
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
OGAWA Hirofumi [Mon, 24 May 2010 21:33:11 +0000 (14:33 -0700)]
ratelimit: add ratelimit_state_init()
For now, all users of ratelimit_state allocates it statically, so
DEFINE_RATELIMIT_STATE() is enough. But, I want to use ratelimit_state
for fs, i.e. per super_block to suppress too many error reports.
So, this adds ratelimit_state_init() to initialize ratelimite_state
which is dynamically allocated, instead of opencoding.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
OGAWA Hirofumi [Mon, 24 May 2010 21:33:11 +0000 (14:33 -0700)]
printk_ratelimited(): fix uninitialized spinlock
ratelimit_state initialization of printk_ratelimited() seems broken. This
fixes it by using DEFINE_RATELIMIT_STATE() to initialize spinlock
properly.
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Samu Onkalo [Mon, 24 May 2010 21:33:10 +0000 (14:33 -0700)]
drivers: misc: pass miscdevice pointer via file private data
For misc devices, inode->i_cdev doesn't point to the device drivers own
data. Link between file operations and device driver internal data is
lost. Pass pointer to misc device struct via file private data for driver
open function use.
Signed-off-by: Samu Onkalo <samu.p.onkalo@nokia.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andrew Morton [Mon, 24 May 2010 21:33:10 +0000 (14:33 -0700)]
include/asm-generic/kmap_types.h: add helpful reminder
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Peter Fritzsche [Mon, 24 May 2010 21:33:09 +0000 (14:33 -0700)]
asm-generic: don't warn that atomic_t is only 24 bit
32-bit Sparc used to only allow usage of 24-bit of it's atomic_t type.
This was corrected with linux 2.6.3 when Keith M Wesolowski changed the
implementation to use the parisc approach of having an array of spinlocks
to protect the atomic_t.
These warnings were also removed from the sparc implementation when the
new implementation was merged in BKrev:402e4949VThdc6D3iaosSFUgabMfvw, but
the warning still remained in some other places without any 24-bit-only
atomic_t implementation inside the kernel.
We should remove these warnings to allow users to rely on the full 32-bit
range of atomic_t.
Signed-off-by: Peter Fritzsche <peter.fritzsche@gmx.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Mon, 24 May 2010 21:33:08 +0000 (14:33 -0700)]
kernel.h: add pr_warn for symmetry to dev_warn, netdev_warn
The current logging macros are
pr_<level>, dev_<level>, netdev_<level>, and netif_<level>.
pr_ uses warning, the other use warn.
Standardize these logging macros a bit more by adding pr_warn and
pr_warn_ratelimited.
Right now, there are:
$ for level in emerg alert crit err warn warning notice info ; do \
for prefix in pr dev netdev netif ; do \
echo -n "${prefix}_${level}: `git grep -w "${prefix}_${level}" | wc -l` " ; \
done ; \
echo ; \
done
pr_emerg: 45 dev_emerg: 4 netdev_emerg: 1 netif_emerg: 4
pr_alert: 24 dev_alert: 36 netdev_alert: 1 netif_alert: 6
pr_crit: 24 dev_crit: 22 netdev_crit: 1 netif_crit: 4
pr_err: 2013 dev_err: 8467 netdev_err: 267 netif_err: 240
pr_warn: 0 dev_warn: 1818 netdev_warn: 126 netif_warn: 23
pr_warning: 773 dev_warning: 0 netdev_warning: 0 netif_warning: 0
pr_notice: 148 dev_notice: 111 netdev_notice: 9 netif_notice: 3
pr_info: 1717 dev_info: 3007 netdev_info: 101 netif_info: 85
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Mon, 24 May 2010 21:33:07 +0000 (14:33 -0700)]
ntfs: use add_to_page_cache_lru()
Quote from Nick piggin's about btrfs patch
- http://www.mail-archive.com/linux-btrfs@vger.kernel.org/msg04472.html.
"add_to_page_cache_lru is exported, so it should be used. Benefits over
using a private pagevec: neater code, 128 bytes fewer stack used, percpu
lru ordering is preserved, and finally don't need to flush pagevec
before returning so batching may be shared with other LRU insertions."
Let's use it instead of private pagevec in ntfs, too.
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Anton Altaparmakov <aia21@cantab.net>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Minchan Kim [Mon, 24 May 2010 21:33:06 +0000 (14:33 -0700)]
ntfs: clean up ntfs_attr_extend_initialized
cached_page and lru_pvec have not been used. Let's remove the arguments.
Signed-off-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Anton Altaparmakov <aia21@cantab.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alex Riesen [Mon, 24 May 2010 21:33:05 +0000 (14:33 -0700)]
sunrpc: use formatting of module name in SUNRPC
gcc-4.3.3 produces the warning:
"format not a string literal and no format arguments"
Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <cel@citi.umich.edu>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Tom Talpey <tmtalpey@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Phil Carmody [Mon, 24 May 2010 21:33:04 +0000 (14:33 -0700)]
hvsi: fix messed up error checking getting state name
Handle out-of-range indices before reading what they refer to. And don't
access the one-past-the-end element of the array either.
Signed-off-by: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Roel Kluin <roel.kluin@gmail.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Alexey Dobriyan [Mon, 24 May 2010 21:33:03 +0000 (14:33 -0700)]
kernel-wide: replace USHORT_MAX, SHORT_MAX and SHORT_MIN with USHRT_MAX, SHRT_MAX and SHRT_MIN
- C99 knows about USHRT_MAX/SHRT_MAX/SHRT_MIN, not
USHORT_MAX/SHORT_MAX/SHORT_MIN.
- Make SHRT_MIN of type s16, not int, for consistency.
[akpm@linux-foundation.org: fix drivers/dma/timb_dma.c]
[akpm@linux-foundation.org: fix security/keys/keyring.c]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yury Polyanskiy [Mon, 24 May 2010 21:33:02 +0000 (14:33 -0700)]
hangcheck-timer: fix x86_32 bugs
drivers/char/hangcheck-timer.c is doubly broken. When the overflown value
of TIMER_FREQ is abnormally low, it spams the syslog with KERN_CRIT
messages "Hangcheck: hangcheck value past margin!" But whether it happens
or not depends on HZ and lpj in a complex way. People have hit it
occasionally as far as google search can tell.
First, the following line overflows unsigned long:
# define TIMER_FREQ (HZ*loops_per_jiffy)
Second, and more importantly, loops_per_jiffy has little to do with the
con= version from the the time scale of get_cycles() (aka rdtsc) to the
time scale of jiffies.
The attached patch resolves both of the problems.
Acked-by: Joel Becker <joel.becker@oracle.com>
Cc: john stultz <johnstul@us.ibm.com>
Cc: Jan Glauber <jan.glauber@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joakim Tjernlund [Mon, 24 May 2010 21:33:01 +0000 (14:33 -0700)]
endian: #define __BYTE_ORDER
Linux does not define __BYTE_ORDER in its endian header files which makes
some header files bend backwards to get at the current endian. Lets
#define __BYTE_ORDER in big_endian.h/litte_endian.h to make it easier for
header files that are used in user space too.
In userspace the convention is that
1. _both_ __LITTLE_ENDIAN and __BIG_ENDIAN are defined,
2. you have to test for e.g. __BYTE_ORDER == __BIG_ENDIAN.
Signed-off-by: Joakim Tjernlund <Joakim.Tjernlund@transmode.se>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jani Nikula [Mon, 24 May 2010 21:33:00 +0000 (14:33 -0700)]
err.h: add __must_check to error pointer handlers
Add __must_check to error pointer handlers to have the compiler warn about
mistakes like:
if (err)
ERR_PTR(err);
It found two bugs:
Mar 12 Nikula Jani [PATCH] enclosure: fix error path - actually return ERR_PTR() on error
Mar 12 Nikula Jani [PATCH] sunrpc: fix error path - actually return ERR_PTR() on error
Signed-off-by: Jani Nikula <ext-jani.1.nikula@nokia.com>
Cc: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arjan van de Ven [Mon, 24 May 2010 21:32:59 +0000 (14:32 -0700)]
cpuidle: add a repeating pattern detector to the menu governor
Currently, the menu governor uses the (corrected) next timer as key item
for predicting the idle duration.
It turns out that there are specific cases where this breaks down: There
are cases where we have a very repetitive pattern of idle durations, where
the idle period is pretty much the same, for reasons completely unrelated
to the next timer event. Examples of such repeating patterns are network
loads with irq mitigation, the mouse moving but in theory also the wifi
beacons.
This patch adds a relatively simple detector for such repeating patterns,
where the standard deviation of the last 8 idle periods is compared to a
threshold.
With this extra predictor in place, measurements show that the DECAY
factor can now be increased (the decaying average will now decay slower)
to get an even more stable result.
[arjan@infradead.org: fix bug identified by Frank]
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Cc: Corrado Zoccolo <czoccolo@gmail.com>
Cc: Frank Rowand <frank.rowand@am.sony.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
FUJITA Tomonori [Mon, 24 May 2010 21:32:58 +0000 (14:32 -0700)]
mn10300: set ARCH_KMALLOC_MINALIGN
Architectures that handle DMA-non-coherent memory need to set
ARCH_KMALLOC_MINALIGN to make sure that kmalloc'ed buffer is DMA-safe: the
buffer doesn't share a cache with the others.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: David Howells <dhowells@redhat.com>
Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mathieu Desnoyers [Mon, 24 May 2010 21:32:56 +0000 (14:32 -0700)]
mn10300: use generic atomic.h
asm-generic/atomic.h has been derived from the mn10300 implementation.
Remove the now duplicated mn10300 implementation by including the generic
version instead.
This adds cmpxchg_local() and cmpxchg64_local() for free to the
architecture, as they are implemented in asm-generic/atomic.h.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Peter Fritzsche <peter.fritzsche@gmx.de>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Jamie Lokier <jamie@shareable.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Keith M Wesolowski <wesolows@foobazco.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Greg Ungerer [Mon, 24 May 2010 21:32:55 +0000 (14:32 -0700)]
m68knommu: fix broken use of BUAD_TABLE_SIZE in 68328serial driver
Commit
8b505ca8e2600eb9e7dd2d6b2682a81717671374 ("serial: 68328serial.c:
remove BAUD_TABLE_SIZE macro") misses one use of BAUD_TABLE_SIZE. So the
resulting 68328serial.c does not compile:
drivers/serial/68328serial.c: In function `m68328_console_setup':
drivers/serial/68328serial.c:1439: error: `BAUD_TABLE_SIZE' undeclared (first use in this function)
drivers/serial/68328serial.c:1439: error: (Each undeclared identifier is reported only once
drivers/serial/68328serial.c:1439: error: for each function it appears in.)
Fix that last use of it.
Signed-off-by: Greg Ungerer <gerg@uclinux.org>
Cc: Thiago Farina <tfransosi@gmail.com>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
FUJITA Tomonori [Mon, 24 May 2010 21:32:54 +0000 (14:32 -0700)]
frv: set ARCH_KMALLOC_MINALIGN
Architectures that handle DMA-non-coherent memory need to set
ARCH_KMALLOC_MINALIGN to make sure that kmalloc'ed buffer is DMA-safe: the
buffer doesn't share a cache with the others.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: David Howells <dhowells@redhat.com>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Howells [Mon, 24 May 2010 21:32:54 +0000 (14:32 -0700)]
frv: extend gdbstub to support more features of gdb
Extend gdbstub to support more features of gdb remote protocol to keep
gdb-7 and emacs gud mode happy:
(*) The D command. Detach debugger.
(*) The H command. Handle setting the target thread by ignoring it.
(*) The qAttached command. Indicate we 'attached' to an existing process.
(*) The qC command. Indicate that the current thread ID is 0.
(*) The qOffsets command. Indicate that no relocation has been done.
(*) The qSymbol:: command. Indicate that we're not interested in looking up
any symbol addresses.
(*) The qSupported command. Indicate the maximum packet size and the fact
that reverse step and continue aren't supported.
(*) The vCont? command. Indicate that we don't support any of its variants.
Also make it possible to trace the commands and replies without tracing
the individual character I/O.
[akpm@linux-foundation.org: make gdbstub_handle_query() static]
Signed-off-by: David Howells <dhowells@redhat.com>
Cc: Jason Wessel <jason.wessel@windriver.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris Metcalf [Mon, 24 May 2010 21:32:53 +0000 (14:32 -0700)]
mm: make lowmem_page_address() use PFN_PHYS() for improved portability
This ensures that platforms with lowmem PAs above 32 bits work correctly
by avoiding truncating the PA during a left shift.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Cc: Barry Song <21cnbao@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Haicheng Li [Mon, 24 May 2010 21:32:52 +0000 (14:32 -0700)]
mem-hotplug: fix potential race while building zonelist for new populated zone
Add global mutex zonelists_mutex to fix the possible race:
CPU0 CPU1 CPU2
(1) zone->present_pages += online_pages;
(2) build_all_zonelists();
(3) alloc_page();
(4) free_page();
(5) build_all_zonelists();
(6) __build_all_zonelists();
(7) zone->pageset = alloc_percpu();
In step (3,4), zone->pageset still points to boot_pageset, so bad
things may happen if 2+ nodes are in this state. Even if only 1 node
is accessing the boot_pageset, (3) may still consume too much memory
to fail the memory allocations in step (7).
Besides, atomic operation ensures alloc_percpu() in step (7) will never fail
since there is a new fresh memory block added in step(6).
[haicheng.li@linux.intel.com: hold zonelists_mutex when build_all_zonelists]
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Andi Kleen <andi.kleen@intel.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Haicheng Li [Mon, 24 May 2010 21:32:51 +0000 (14:32 -0700)]
mem-hotplug: avoid multiple zones sharing same boot strapping boot_pageset
For each new populated zone of hotadded node, need to update its pagesets
with dynamically allocated per_cpu_pageset struct for all possible CPUs:
1) Detach zone->pageset from the shared boot_pageset
at end of __build_all_zonelists().
2) Use mutex to protect zone->pageset when it's still
shared in onlined_pages()
Otherwises, multiple zones of different nodes would share same boot strapping
boot_pageset for same CPU, which will finally cause below kernel panic:
------------[ cut here ]------------
kernel BUG at mm/page_alloc.c:1239!
invalid opcode: 0000 [#1] SMP
...
Call Trace:
[<
ffffffff811300c1>] __alloc_pages_nodemask+0x131/0x7b0
[<
ffffffff81162e67>] alloc_pages_current+0x87/0xd0
[<
ffffffff81128407>] __page_cache_alloc+0x67/0x70
[<
ffffffff811325f0>] __do_page_cache_readahead+0x120/0x260
[<
ffffffff81132751>] ra_submit+0x21/0x30
[<
ffffffff811329c6>] ondemand_readahead+0x166/0x2c0
[<
ffffffff81132ba0>] page_cache_async_readahead+0x80/0xa0
[<
ffffffff8112a0e4>] generic_file_aio_read+0x364/0x670
[<
ffffffff81266cfa>] nfs_file_read+0xca/0x130
[<
ffffffff8117b20a>] do_sync_read+0xfa/0x140
[<
ffffffff8117bf75>] vfs_read+0xb5/0x1a0
[<
ffffffff8117c151>] sys_read+0x51/0x80
[<
ffffffff8103c032>] system_call_fastpath+0x16/0x1b
RIP [<
ffffffff8112ff13>] get_page_from_freelist+0x883/0x900
RSP <
ffff88000d1e78a8>
---[ end trace
4bda28328b9990db ]
[akpm@linux-foundation.org: merge fix]
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: Andi Kleen <andi.kleen@intel.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wu Fengguang [Mon, 24 May 2010 21:32:49 +0000 (14:32 -0700)]
mem-hotplug: separate setup_per_cpu_pageset() into separate functions
No behavior change here.
Move some of setup_per_cpu_pageset() code into a new function
setup_zone_pageset() that will be useful for memory hotplug.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Haicheng Li <haicheng.li@linux.intel.com>
Reviewed-by: Andi Kleen <andi.kleen@intel.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Marcelo Roberto Jimenez [Mon, 24 May 2010 21:32:47 +0000 (14:32 -0700)]
mm: fix NR_SECTION_ROOTS == 0 when using using sparsemem extreme.
Got this while compiling for ARM/SA1100:
mm/sparse.c: In function '__section_nr':
mm/sparse.c:135: warning: 'root' is used uninitialized in this function
This patch follows Russell King's suggestion for a new calculation for
NR_SECTION_ROOTS. Thanks also to Sergei Shtylyov for pointing out the
existence of the macro DIV_ROUND_UP.
Atsushi Nemoto observed:
: This fix doesn't just silence the warning - it fixes a real problem.
:
: Without this fix, mem_section[] might have 0 size so mem_section[0]
: will share other variable area. For example, I got:
:
:
c030c700 b __warned.16478
:
c030c700 B mem_section
:
c030c701 b __warned.16483
:
: This might cause very strange behavior. Your patch actually fixes it.
Signed-off-by: Marcelo Roberto Jimenez <mroberto@cpti.cetuc.puc-rio.br>
Cc: Atsushi Nemoto <anemo@mba.ocn.ne.jp>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Sergei Shtylyov <sshtylyov@mvista.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Mon, 24 May 2010 21:32:46 +0000 (14:32 -0700)]
highmem: remove unneeded #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT for debug_kmap_atomic()
In
f4112de6b679d84bd9b9681c7504be7bdfb7c7d5 ("mm: introduce
debug_kmap_atomic") I said that debug_kmap_atomic() needs
CONFIG_TRACE_IRQFLAGS_SUPPORT.
It was wrong. (I thought irqs_disabled() is only available when the
architecture has CONFIG_TRACE_IRQFLAGS_SUPPORT)
Remove the #ifdef CONFIG_TRACE_IRQFLAGS_SUPPORT check to enable
kmap_atomic() debugging for the architectures which do not have
CONFIG_TRACE_IRQFLAGS_SUPPORT.
Reported-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
matt mooney [Mon, 24 May 2010 21:32:45 +0000 (14:32 -0700)]
include/linux/gfp.h: fix coding style
Add parenthesis in a define. This doesn't change functionality.
checkpatch errors:
1) white space fixes
2) add spaces after comas
Signed-off-by: matt mooney <mfm@muteddisk.com>
Cc: Dan Carpenter <error27@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
matt mooney [Mon, 24 May 2010 21:32:44 +0000 (14:32 -0700)]
include/linux/gfp.h: spelling fixes
Fix minor spelling errors in a few comments; no code changes.
Signed-off-by: matt mooney <mfm@muteddisk.com>
Cc: Dan Carpenter <error27@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
minskey guo [Mon, 24 May 2010 21:32:41 +0000 (14:32 -0700)]
cpu/mem hotplug: enable CPUs online before local memory online
Enable users to online CPUs even if the CPUs belongs to a numa node which
doesn't have onlined local memory.
The zonlists(pg_data_t.node_zonelists[]) of a numa node are created either
in system boot/init period, or at the time of local memory online. For a
numa node without onlined local memory, its zonelists are not initialized
at present. As a result, any memory allocation operations executed by
CPUs within this node will fail. In fact, an out-of-memory error is
triggered when attempt to online CPUs before memory comes to online.
This patch tries to create zonelists for such numa nodes, so that the
memory allocation for this node can be fallback'ed to other nodes.
[akpm@linux-foundation.org: remove unneeded export]
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: minskey guo<chaohong.guo@intel.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Mon, 24 May 2010 21:32:40 +0000 (14:32 -0700)]
vmscan: remove isolate_pages callback scan control
For now, we have global isolation vs. memory control group isolation, do
not allow the reclaim entry function to set an arbitrary page isolation
callback, we do not need that flexibility.
And since we already pass around the group descriptor for the memory
control group isolation case, just use it to decide which one of the two
isolator functions to use.
The decisions can be merged into nearby branches, so no extra cost there.
In fact, we save the indirect calls.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Mon, 24 May 2010 21:32:40 +0000 (14:32 -0700)]
vmscan: remove all_unreclaimable scan control
This scan control is abused to communicate a return value from
shrink_zones(). Write this idiomatically and remove the knob.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Mon, 24 May 2010 21:32:39 +0000 (14:32 -0700)]
mm: document follow_page()
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Dan Carpenter <error27@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Izik Eidus <ieidus@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard Kennedy [Mon, 24 May 2010 21:32:38 +0000 (14:32 -0700)]
fs-writeback: check sync bit earlier in inode_wait_for_writeback
When wb_writeback() hasn't written anything it will re-acquire the inode
lock before calling inode_wait_for_writeback.
This change tests the sync bit first so that is doesn't need to drop &
re-acquire the lock if the inode became available while wb_writeback() was
waiting to get the lock.
Signed-off-by: Richard Kennedy <richard@rsk.demon.co.uk>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Mon, 24 May 2010 21:32:38 +0000 (14:32 -0700)]
mm: introduce free_pages_prepare()
free_hot_cold_page() and __free_pages_ok() have very similar freeing
preparation. Consolidate them.
[akpm@linux-foundation.org: fix busted coding style]
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KOSAKI Motohiro [Mon, 24 May 2010 21:32:37 +0000 (14:32 -0700)]
vmscan: page_check_references(): check low order lumpy reclaim properly
If vmscan is under lumpy reclaim mode, it have to ignore referenced bit
for making contenious free pages. but current page_check_references()
doesn't.
Fix it.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Huang Shijie [Mon, 24 May 2010 21:32:36 +0000 (14:32 -0700)]
readahead.c: fix comment
Fix a wrong comment over page_cache_async_readahead().
Signed-off-by: Huang Shijie <shijie8@gmail.com>
Acked-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shaohua Li [Mon, 24 May 2010 21:32:36 +0000 (14:32 -0700)]
vmscan: prevent get_scan_ratio() rounding errors
get_scan_ratio() calculates percentage and if the percentage is < 1%, it
will round percentage down to 0% and cause we completely ignore scanning
anon/file pages to reclaim memory even the total anon/file pages are very
big.
To avoid underflow, we don't use percentage, instead we directly calculate
how many pages should be scaned. In this way, we should get several
scanned pages for < 1% percent.
This has some benefits:
1. increase our calculation precision
2. making our scan more smoothly. Without this, if percent[x] is
underflow, shrink_zone() doesn't scan any pages and suddenly it scans
all pages when priority is zero. With this, even priority isn't zero,
shrink_zone() gets chance to scan some pages.
Note, this patch doesn't really change logics, but just increase
precision. For system with a lot of memory, this might slightly changes
behavior. For example, in a sequential file read workload, without the
patch, we don't swap any anon pages. With it, if anon memory size is
bigger than 16G, we will see one anon page swapped. The 16G is calculated
as PAGE_SIZE * priority(4096) * (fp/ap). fp/ap is assumed to be 1024
which is common in this workload. So the impact sounds not a big deal.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Greg Thelen [Mon, 24 May 2010 21:32:33 +0000 (14:32 -0700)]
mm: consider the entire user address space during node migration
Use mm->task_size instead of TASK_SIZE to ensure that the entire user
address space is migrated. mm->task_size is independent of the calling
task context. TASK SIZE may be dependant on the address space size of the
calling process. Usage of TASK_SIZE can lead to partial address space
migration if the calling process was 32 bit and the migrating process was
64 bit.
Here is the test script used on 64 system with a 32 bit echo process:
mount -t cgroup none /cgroup -o cpuset
cd /cgroup
mkdir 0
echo 1 > 0/cpuset.cpus
echo 0 > 0/cpuset.mems
echo 1 > 0/cpuset.memory_migrate
mkdir 1
echo 1 > 1/cpuset.cpus
echo 1 > 1/cpuset.mems
echo 1 > 1/cpuset.memory_migrate
echo $$ > 0/tasks
64_bit_process &
pid=$!
echo $pid > 1/tasks # This does not migrate all process pages without
# this patch. If 64 bit echo is used or this patch is
# applied, then the full address space of $pid is
# migrated.
To check memory migration, I watched:
grep MemUsed /sys/devices/system/node/node*/meminfo
Signed-off-by: Greg Thelen <gthelen@google.com>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Cc: Balbir Singh <balbir@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Mon, 24 May 2010 21:32:32 +0000 (14:32 -0700)]
mm: compaction: defer compaction using an exponential backoff when compaction fails
The fragmentation index may indicate that a failure is due to external
fragmentation but after a compaction run completes, it is still possible
for an allocation to fail. There are two obvious reasons as to why
o Page migration cannot move all pages so fragmentation remains
o A suitable page may exist but watermarks are not met
In the event of compaction followed by an allocation failure, this patch
defers further compaction in the zone (1 << compact_defer_shift) times.
If the next compaction attempt also fails, compact_defer_shift is
increased up to a maximum of 6. If compaction succeeds, the defer
counters are reset again.
The zone that is deferred is the first zone in the zonelist - i.e. the
preferred zone. To defer compaction in the other zones, the information
would need to be stored in the zonelist or implemented similar to the
zonelist_cache. This would impact the fast-paths and is not justified at
this time.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Mon, 24 May 2010 21:32:31 +0000 (14:32 -0700)]
mm: compaction: add a tunable that decides when memory should be compacted and when it should be reclaimed
The kernel applies some heuristics when deciding if memory should be
compacted or reclaimed to satisfy a high-order allocation. One of these
is based on the fragmentation. If the index is below 500, memory will not
be compacted. This choice is arbitrary and not based on data. To help
optimise the system and set a sensible default for this value, this patch
adds a sysctl extfrag_threshold. The kernel will only compact memory if
the fragmentation index is above the extfrag_threshold.
[randy.dunlap@oracle.com: Fix build errors when proc fs is not configured]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Mon, 24 May 2010 21:32:30 +0000 (14:32 -0700)]
mm: compaction: direct compact when a high-order allocation fails
Ordinarily when a high-order allocation fails, direct reclaim is entered
to free pages to satisfy the allocation. With this patch, it is
determined if an allocation failed due to external fragmentation instead
of low memory and if so, the calling process will compact until a suitable
page is freed. Compaction by moving pages in memory is considerably
cheaper than paging out to disk and works where there are locked pages or
no swap. If compaction fails to free a page of a suitable size, then
reclaim will still occur.
Direct compaction returns as soon as possible. As each block is
compacted, it is checked if a suitable page has been freed and if so, it
returns.
[akpm@linux-foundation.org: Fix build errors]
[aarcange@redhat.com: fix count_vm_event preempt in memory compaction direct reclaim]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Mon, 24 May 2010 21:32:29 +0000 (14:32 -0700)]
mm: compaction: add /sys trigger for per-node memory compaction
Add a per-node sysfs file called compact. When the file is written to,
each zone in that node is compacted. The intention that this would be
used by something like a job scheduler in a batch system before a job
starts so that the job can allocate the maximum number of hugepages
without significant start-up cost.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Mon, 24 May 2010 21:32:28 +0000 (14:32 -0700)]
mm: compaction: add /proc trigger for memory compaction
Add a proc file /proc/sys/vm/compact_memory. When an arbitrary value is
written to the file, all zones are compacted. The expected user of such a
trigger is a job scheduler that prepares the system before the target
application runs.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Mon, 24 May 2010 21:32:27 +0000 (14:32 -0700)]
mm: compaction: memory compaction core
This patch is the core of a mechanism which compacts memory in a zone by
relocating movable pages towards the end of the zone.
A single compaction run involves a migration scanner and a free scanner.
Both scanners operate on pageblock-sized areas in the zone. The migration
scanner starts at the bottom of the zone and searches for all movable
pages within each area, isolating them onto a private list called
migratelist. The free scanner starts at the top of the zone and searches
for suitable areas and consumes the free pages within making them
available for the migration scanner. The pages isolated for migration are
then migrated to the newly isolated free pages.
[aarcange@redhat.com: Fix unsafe optimisation]
[mel@csn.ul.ie: do not schedule work on other CPUs for compaction]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Mon, 24 May 2010 21:32:26 +0000 (14:32 -0700)]
mm: move definition for LRU isolation modes to a header
Currently, vmscan.c defines the isolation modes for __isolate_lru_page().
Memory compaction needs access to these modes for isolating pages for
migration. This patch exports them.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Mon, 24 May 2010 21:32:26 +0000 (14:32 -0700)]
mm: export fragmentation index via debugfs
The fragmentation fragmentation index, is only meaningful if an allocation
would fail and indicates what the failure is due to. A value of -1 such
as in many of the examples above states that the allocation would succeed.
If it would fail, the value is between 0 and 1. A value tending towards
0 implies the allocation failed due to a lack of memory. A value tending
towards 1 implies it failed due to external fragmentation.
For the most part, the huge page size will be the size of interest but not
necessarily so it is exported on a per-order and per-zo basis via
/sys/kernel/debug/extfrag/extfrag_index
> cat /sys/kernel/debug/extfrag/extfrag_index
Node 0, zone DMA -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.00
Node 0, zone Normal -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 -1.000 0.954
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Mon, 24 May 2010 21:32:25 +0000 (14:32 -0700)]
mm: export unusable free space index via debugfs
The unusable free space index measures how much of the available free
memory cannot be used to satisfy an allocation of a given size and is a
value between 0 and 1. The higher the value, the more of free memory is
unusable and by implication, the worse the external fragmentation is. For
the most part, the huge page size will be the size of interest but not
necessarily so it is exported on a per-order and per-zone basis via
/sys/kernel/debug/extfrag/unusable_index.
> cat /sys/kernel/debug/extfrag/unusable_index
Node 0, zone DMA 0.000 0.000 0.000 0.001 0.005 0.013 0.021 0.037 0.037 0.101 0.230
Node 0, zone Normal 0.000 0.000 0.000 0.001 0.002 0.002 0.005 0.015 0.028 0.028 0.054
[akpm@linux-foundation.org: Fix allnoconfig]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Mon, 24 May 2010 21:32:24 +0000 (14:32 -0700)]
mm: migration: avoid race between shift_arg_pages() and rmap_walk() during migration by not migrating temporary stacks
Page migration requires rmap to be able to find all ptes mapping a page
at all times, otherwise the migration entry can be instantiated, but it
is possible to leave one behind if the second rmap_walk fails to find
the page. If this page is later faulted, migration_entry_to_page() will
call BUG because the page is locked indicating the page was migrated by
the migration PTE not cleaned up. For example
kernel BUG at include/linux/swapops.h:105!
invalid opcode: 0000 [#1] PREEMPT SMP
...
Call Trace:
[<
ffffffff810e951a>] handle_mm_fault+0x3f8/0x76a
[<
ffffffff8130c7a2>] do_page_fault+0x44a/0x46e
[<
ffffffff813099b5>] page_fault+0x25/0x30
[<
ffffffff8114de33>] load_elf_binary+0x152a/0x192b
[<
ffffffff8111329b>] search_binary_handler+0x173/0x313
[<
ffffffff81114896>] do_execve+0x219/0x30a
[<
ffffffff8100a5c6>] sys_execve+0x43/0x5e
[<
ffffffff8100320a>] stub_execve+0x6a/0xc0
RIP [<
ffffffff811094ff>] migration_entry_wait+0xc1/0x129
There is a race between shift_arg_pages and migration that triggers this
bug. A temporary stack is setup during exec and later moved. If
migration moves a page in the temporary stack and the VMA is then removed
before migration completes, the migration PTE may not be found leading to
a BUG when the stack is faulted.
This patch causes pages within the temporary stack during exec to be
skipped by migration. It does this by marking the VMA covering the
temporary stack with an otherwise impossible combination of VMA flags.
These flags are cleared when the temporary stack is moved to its final
location.
[kamezawa.hiroyu@jp.fujitsu.com: idea for having migration skip temporary stacks]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Mon, 24 May 2010 21:32:21 +0000 (14:32 -0700)]
mm: allow CONFIG_MIGRATION to be set without CONFIG_NUMA or memory hot-remove
CONFIG_MIGRATION currently depends on CONFIG_NUMA or on the architecture
being able to hot-remove memory. The main users of page migration such as
sys_move_pages(), sys_migrate_pages() and cpuset process migration are
only beneficial on NUMA so it makes sense.
As memory compaction will operate within a zone and is useful on both NUMA
and non-NUMA systems, this patch allows CONFIG_MIGRATION to be set if the
user selects CONFIG_COMPACTION as an option.
[akpm@linux-foundation.org: Depend on CONFIG_HUGETLB_PAGE]
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: Rik van Riel <riel@redhat.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Mon, 24 May 2010 21:32:20 +0000 (14:32 -0700)]
mm: migration: allow the migration of PageSwapCache pages
PageAnon pages that are unmapped may or may not have an anon_vma so are
not currently migrated. However, a swap cache page can be migrated and
fits this description. This patch identifies page swap caches and allows
them to be migrated but ensures that no attempt to made to remap the pages
would would potentially try to access an already freed anon_vma.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Mon, 24 May 2010 21:32:19 +0000 (14:32 -0700)]
mm: migration: do not try to migrate unmapped anonymous pages
rmap_walk_anon() was triggering errors in memory compaction that look like
use-after-free errors. The problem is that between the page being
isolated from the LRU and rcu_read_lock() being taken, the mapcount of the
page dropped to 0 and the anon_vma gets freed. This can happen during
memory compaction if pages being migrated belong to a process that exits
before migration completes. Hence, the use-after-free race looks like
1. Page isolated for migration
2. Process exits
3. page_mapcount(page) drops to zero so anon_vma was no longer reliable
4. unmap_and_move() takes the rcu_lock but the anon_vma is already garbage
4. call try_to_unmap, looks up tha anon_vma and "locks" it but the lock
is garbage.
This patch checks the mapcount after the rcu lock is taken. If the
mapcount is zero, the anon_vma is assumed to be freed and no further
action is taken.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Mon, 24 May 2010 21:32:18 +0000 (14:32 -0700)]
mm: migration: share the anon_vma ref counts between KSM and page migration
For clarity of review, KSM and page migration have separate refcounts on
the anon_vma. While clear, this is a waste of memory. This patch gets
KSM and page migration to share their toys in a spirit of harmony.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: Minchan Kim <minchan.kim@gmail.com>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: Christoph Lameter <cl@linux-foundation.org>
Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Mon, 24 May 2010 21:32:17 +0000 (14:32 -0700)]
mm: migration: take a reference to the anon_vma before migrating
This patchset is a memory compaction mechanism that reduces external
fragmentation memory by moving GFP_MOVABLE pages to a fewer number of
pageblocks. The term "compaction" was chosen as there are is a number of
mechanisms that are not mutually exclusive that can be used to defragment
memory. For example, lumpy reclaim is a form of defragmentation as was
slub "defragmentation" (really a form of targeted reclaim). Hence, this
is called "compaction" to distinguish it from other forms of
defragmentation.
In this implementation, a full compaction run involves two scanners
operating within a zone - a migration and a free scanner. The migration
scanner starts at the beginning of a zone and finds all movable pages
within one pageblock_nr_pages-sized area and isolates them on a
migratepages list. The free scanner begins at the end of the zone and
searches on a per-area basis for enough free pages to migrate all the
pages on the migratepages list. As each area is respectively migrated or
exhausted of free pages, the scanners are advanced one area. A compaction
run completes within a zone when the two scanners meet.
This method is a bit primitive but is easy to understand and greater
sophistication would require maintenance of counters on a per-pageblock
basis. This would have a big impact on allocator fast-paths to improve
compaction which is a poor trade-off.
It also does not try relocate virtually contiguous pages to be physically
contiguous. However, assuming transparent hugepages were in use, a
hypothetical khugepaged might reuse compaction code to isolate free pages,
split them and relocate userspace pages for promotion.
Memory compaction can be triggered in one of three ways. It may be
triggered explicitly by writing any value to /proc/sys/vm/compact_memory
and compacting all of memory. It can be triggered on a per-node basis by
writing any value to /sys/devices/system/node/nodeN/compact where N is the
node ID to be compacted. When a process fails to allocate a high-order
page, it may compact memory in an attempt to satisfy the allocation
instead of entering direct reclaim. Explicit compaction does not finish
until the two scanners meet and direct compaction ends if a suitable page
becomes available that would meet watermarks.
The series is in 14 patches. The first three are not "core" to the series
but are important pre-requisites.
Patch 1 reference counts anon_vma for rmap_walk_anon(). Without this
patch, it's possible to use anon_vma after free if the caller is
not holding a VMA or mmap_sem for the pages in question. While
there should be no existing user that causes this problem,
it's a requirement for memory compaction to be stable. The patch
is at the start of the series for bisection reasons.
Patch 2 merges the KSM and migrate counts. It could be merged with patch 1
but would be slightly harder to review.
Patch 3 skips over unmapped anon pages during migration as there are no
guarantees about the anon_vma existing. There is a window between
when a page was isolated and migration started during which anon_vma
could disappear.
Patch 4 notes that PageSwapCache pages can still be migrated even if they
are unmapped.
Patch 5 allows CONFIG_MIGRATION to be set without CONFIG_NUMA
Patch 6 exports a "unusable free space index" via debugfs. It's
a measure of external fragmentation that takes the size of the
allocation request into account. It can also be calculated from
userspace so can be dropped if requested
Patch 7 exports a "fragmentation index" which only has meaning when an
allocation request fails. It determines if an allocation failure
would be due to a lack of memory or external fragmentation.
Patch 8 moves the definition for LRU isolation modes for use by compaction
Patch 9 is the compaction mechanism although it's unreachable at this point
Patch 10 adds a means of compacting all of memory with a proc trgger
Patch 11 adds a means of compacting a specific node with a sysfs trigger
Patch 12 adds "direct compaction" before "direct reclaim" if it is
determined there is a good chance of success.
Patch 13 adds a sysctl that allows tuning of the threshold at which the
kernel will compact or direct reclaim
Patch 14 temporarily disables compaction if an allocation failure occurs
after compaction.
Testing of compaction was in three stages. For the test, debugging,
preempt, the sleep watchdog and lockdep were all enabled but nothing nasty
popped out. min_free_kbytes was tuned as recommended by hugeadm to help
fragmentation avoidance and high-order allocations. It was tested on X86,
X86-64 and PPC64.
Ths first test represents one of the easiest cases that can be faced for
lumpy reclaim or memory compaction.
1. Machine freshly booted and configured for hugepage usage with
a) hugeadm --create-global-mounts
b) hugeadm --pool-pages-max DEFAULT:8G
c) hugeadm --set-recommended-min_free_kbytes
d) hugeadm --set-recommended-shmmax
The min_free_kbytes here is important. Anti-fragmentation works best
when pageblocks don't mix. hugeadm knows how to calculate a value that
will significantly reduce the worst of external-fragmentation-related
events as reported by the mm_page_alloc_extfrag tracepoint.
2. Load up memory
a) Start updatedb
b) Create in parallel a X files of pagesize*128 in size. Wait
until files are created. By parallel, I mean that 4096 instances
of dd were launched, one after the other using &. The crude
objective being to mix filesystem metadata allocations with
the buffer cache.
c) Delete every second file so that pageblocks are likely to
have holes
d) kill updatedb if it's still running
At this point, the system is quiet, memory is full but it's full with
clean filesystem metadata and clean buffer cache that is unmapped.
This is readily migrated or discarded so you'd expect lumpy reclaim
to have no significant advantage over compaction but this is at
the POC stage.
3. In increments, attempt to allocate 5% of memory as hugepages.
Measure how long it took, how successful it was, how many
direct reclaims took place and how how many compactions. Note
the compaction figures might not fully add up as compactions
can take place for orders other than the hugepage size
X86 vanilla compaction
Final page count 913 916 (attempted 1002)
pages reclaimed 68296 9791
X86-64 vanilla compaction
Final page count: 901 902 (attempted 1002)
Total pages reclaimed: 112599 53234
PPC64 vanilla compaction
Final page count: 93 94 (attempted 110)
Total pages reclaimed: 103216 61838
There was not a dramatic improvement in success rates but it wouldn't be
expected in this case either. What was important is that fewer pages were
reclaimed in all cases reducing the amount of IO required to satisfy a
huge page allocation.
The second tests were all performance related - kernbench, netperf, iozone
and sysbench. None showed anything too remarkable.
The last test was a high-order allocation stress test. Many kernel
compiles are started to fill memory with a pressured mix of unmovable and
movable allocations. During this, an attempt is made to allocate 90% of
memory as huge pages - one at a time with small delays between attempts to
avoid flooding the IO queue.
vanilla compaction
Percentage of request allocated X86 98 99
Percentage of request allocated X86-64 95 98
Percentage of request allocated PPC64 55 70
This patch:
rmap_walk_anon() does not use page_lock_anon_vma() for looking up and
locking an anon_vma and it does not appear to have sufficient locking to
ensure the anon_vma does not disappear from under it.
This patch copies an approach used by KSM to take a reference on the
anon_vma while pages are being migrated. This should prevent rmap_walk()
running into nasty surprises later because anon_vma has been freed.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David Rientjes [Mon, 24 May 2010 21:32:13 +0000 (14:32 -0700)]
mm: default to node zonelist ordering when nodes have only lowmem
There are two types of zonelist ordering methodologies:
- node order, preferring allocations on a node to stay local to and
- zone order, preferring allocations come from a higher zone to avoid
allocating in lowmem zones even though they may not be local.
The ordering technique used by the kernel is configurable on the command
line, but also has some logic to determine what the default should be.
This logic currently lacks knowledge of systems where a node may only have
lowmem. For such systems, it is necessary to use node order so that
GFP_KERNEL allocations may be satisfied by nodes consisting of only
lowmem.
If zone order is used, GFP_KERNEL allocations to such nodes are actually
allocated on a node with local affinity that includes ZONE_NORMAL.
This change defaults to node zonelist ordering if any node lacks
ZONE_NORMAL.
To force zone order, append 'numa_zonelist_order=zone' to the kernel
command line.
Signed-off-by: David Rientjes <rientjes@google.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Naoya Horiguchi [Mon, 24 May 2010 21:32:12 +0000 (14:32 -0700)]
pagemap: add #ifdefs CONFIG_HUGETLB_PAGE on code walking hugetlb vma
If !CONFIG_HUGETLB_PAGE, pagemap_hugetlb_range() is never called. So put
it (and its calling function) into #ifdef block.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: Matt Mackall <mpm@selenic.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Mon, 24 May 2010 21:32:11 +0000 (14:32 -0700)]
mincore: do nested page table walks
Do page table walks with the well-known nested loops we use in several
other places already.
This avoids doing full page table walks after every pte range and also
allows to handle unmapped areas bigger than one pte range in one go.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Mon, 24 May 2010 21:32:11 +0000 (14:32 -0700)]
mincore: pass ranges as start,end address pairs
Instead of passing a start address and a number of pages into the helper
functions, convert them to use a start and an end address.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Mon, 24 May 2010 21:32:10 +0000 (14:32 -0700)]
mincore: break do_mincore() into logical pieces
Split out functions to handle hugetlb ranges, pte ranges and unmapped
ranges, to improve readability but also to prepare the file structure for
nested page table walks.
No semantic changes intended.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Mon, 24 May 2010 21:32:09 +0000 (14:32 -0700)]
mincore: cleanups
This fixes some minor issues that bugged me while going over the code:
o adjust argument order of do_mincore() to match the syscall
o simplify range length calculation
o drop superfluous shift in huge tlb calculation, address is page aligned
o drop dead nr_huge calculation
o check pte_none() before pte_present()
o comment and whitespace fixes
No semantic changes intended.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miao Xie [Mon, 24 May 2010 21:32:08 +0000 (14:32 -0700)]
cpuset,mm: fix no node to alloc memory when changing cpuset's mems
Before applying this patch, cpuset updates task->mems_allowed and
mempolicy by setting all new bits in the nodemask first, and clearing all
old unallowed bits later. But in the way, the allocator may find that
there is no node to alloc memory.
The reason is that cpuset rebinds the task's mempolicy, it cleans the
nodes which the allocater can alloc pages on, for example:
(mpol: mempolicy)
task1 task1's mpol task2
alloc page 1
alloc on node0? NO 1
1 change mems from 1 to 0
1 rebind task1's mpol
0-1 set new bits
0 clear disallowed bits
alloc on node1? NO 0
...
can't alloc page
goto oom
This patch fixes this problem by expanding the nodes range first(set newly
allowed bits) and shrink it lazily(clear newly disallowed bits). So we
use a variable to tell the write-side task that read-side task is reading
nodemask, and the write-side task clears newly disallowed nodes after
read-side task ends the current memory allocation.
[akpm@linux-foundation.org: fix spello]
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Paul Menage <menage@google.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Miao Xie [Mon, 24 May 2010 21:32:07 +0000 (14:32 -0700)]
mempolicy: restructure rebinding-mempolicy functions
Nick Piggin reported that the allocator may see an empty nodemask when
changing cpuset's mems[1]. It happens only on the kernel that do not do
atomic nodemask_t stores. (MAX_NUMNODES > BITS_PER_LONG)
But I found that there is also a problem on the kernel that can do atomic
nodemask_t stores. The problem is that the allocator can't find a node to
alloc page when changing cpuset's mems though there is a lot of free
memory. The reason is like this:
(mpol: mempolicy)
task1 task1's mpol task2
alloc page 1
alloc on node0? NO 1
1 change mems from 1 to 0
1 rebind task1's mpol
0-1 set new bits
0 clear disallowed bits
alloc on node1? NO 0
...
can't alloc page
goto oom
I can use the attached program reproduce it by the following step:
# mkdir /dev/cpuset
# mount -t cpuset cpuset /dev/cpuset
# mkdir /dev/cpuset/1
# echo `cat /dev/cpuset/cpus` > /dev/cpuset/1/cpus
# echo `cat /dev/cpuset/mems` > /dev/cpuset/1/mems
# echo $$ > /dev/cpuset/1/tasks
# numactl --membind=`cat /dev/cpuset/mems` ./cpuset_mem_hog <nr_tasks> &
<nr_tasks> = max(nr_cpus - 1, 1)
# killall -s SIGUSR1 cpuset_mem_hog
# ./change_mems.sh
several hours later, oom will happen though there is a lot of free memory.
This patchset fixes this problem by expanding the nodes range first(set
newly allowed bits) and shrink it lazily(clear newly disallowed bits). So
we use a variable to tell the write-side task that read-side task is
reading nodemask, and the write-side task clears newly disallowed nodes
after read-side task ends the current memory allocation.
This patch:
In order to fix no node to alloc memory, when we want to update mempolicy
and mems_allowed, we expand the set of nodes first (set all the newly
nodes) and shrink the set of nodes lazily(clean disallowed nodes), But the
mempolicy's rebind functions may breaks the expanding.
So we restructure the mempolicy's rebind functions and split the rebind
work to two steps, just like the update of cpuset's mems: The 1st step:
expand the set of the mempolicy's nodes. The 2nd step: shrink the set of
the mempolicy's nodes. It is used when there is no real lock to protect
the mempolicy in the read-side. Otherwise we can do rebind work at once.
In order to implement it, we define
enum mpol_rebind_step {
MPOL_REBIND_ONCE,
MPOL_REBIND_STEP1,
MPOL_REBIND_STEP2,
MPOL_REBIND_NSTEP,
};
If the mempolicy needn't be updated by two steps, we can pass
MPOL_REBIND_ONCE to the rebind functions. Or we can pass
MPOL_REBIND_STEP1 to do the first step of the rebind work and pass
MPOL_REBIND_STEP2 to do the second step work.
Besides that, it maybe long time between these two step and we have to
release the lock that protects mempolicy and mems_allowed. If we hold the
lock once again, we must check whether the current mempolicy is under the
rebinding (the first step has been done) or not, because the task may
alloc a new mempolicy when we don't hold the lock. So we defined the
following flag to identify it:
#define MPOL_F_REBINDING (1 << 2)
The new functions will be used in the next patch.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Paul Menage <menage@google.com>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Andi Kleen <andi@firstfloor.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lee Schermerhorn [Mon, 24 May 2010 21:32:05 +0000 (14:32 -0700)]
mempolicy: document cpuset interaction with tmpfs mpol mount option
Update Documentation/filesystems/tmpfs.txt to describe the interaction of
tmpfs mount option memory policy with tasks' cpuset mems_allowed.
Note: the mount(8) man page [in the util-linux-ng package] requires
similiar updates.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lee Schermerhorn [Mon, 24 May 2010 21:32:04 +0000 (14:32 -0700)]
mempolicy: factor mpol_shared_policy_init() return paths
Factor out duplicate put/frees in mpol_shared_policy_init() to a common
return path.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lee Schermerhorn [Mon, 24 May 2010 21:32:04 +0000 (14:32 -0700)]
mempolicy: rename policy_types and cleanup initialization
Rename 'policy_types[]' to 'policy_modes[]' to better match the array
contents.
Use designated intializer syntax for policy_modes[].
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lee Schermerhorn [Mon, 24 May 2010 21:32:03 +0000 (14:32 -0700)]
mempolicy: lose unnecessary loop variable in mpol_parse_str()
We don't really need the extra variable 'i' in mpol_parse_str(). The only
use is as the the loop variable. Then, it's assigned to 'mode'. Just use
mode, and loose the 'uninitialized_var()' macro.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lee Schermerhorn [Mon, 24 May 2010 21:32:02 +0000 (14:32 -0700)]
mempolicy: don't call mpol_set_nodemask() when no_context
No need to call mpol_set_nodemask() when we have no context for the
mempolicy. This can occur when we're parsing a tmpfs 'mpol' mount option.
Just save the raw nodemask in the mempolicy's w.user_nodemask member for
use when a tmpfs/shmem file is created. mpol_shared_policy_init() will
"contextualize" the policy for the new file based on the creating task's
context.
Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com>
Cc: Hugh Dickins <hugh.dickins@tiscali.co.uk>
Cc: Ravikiran Thirumalai <kiran@scalex86.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bob Liu [Mon, 24 May 2010 21:32:01 +0000 (14:32 -0700)]
mempolicy: remove redundant check
Lee's patch "mempolicy: use MPOL_PREFERRED for system-wide default policy"
has made the MPOL_DEFAULT only used in the memory policy APIs. So, no
need to check in __mpol_equal also. Also get rid of mpol_match_intent()
and move its logic directly into __mpol_equal().
Signed-off-by: Bob Liu <lliubbo@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bob Liu [Mon, 24 May 2010 21:32:00 +0000 (14:32 -0700)]
mempolicy: remove case MPOL_INTERLEAVE from policy_zonelist()
In policy_zonelist() mode MPOL_INTERLEAVE shouldn't happen, so fall
through to BUG() instead of break to return. I also fixed the comment.
Signed-off-by: Bob Liu <lliubbo@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bob Liu [Mon, 24 May 2010 21:31:59 +0000 (14:31 -0700)]
mempolicy: remove redundant code
1. In funtion is_valid_nodemask(), varibable k will be inited to 0 in
the following loop, needn't init to policy_zone anymore.
2. (MPOL_F_STATIC_NODES | MPOL_F_RELATIVE_NODES) has already defined
to MPOL_MODE_FLAGS in mempolicy.h.
Signed-off-by: Bob Liu <lliubbo@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>