Saurav Kashyap [Thu, 14 Jul 2011 19:00:13 +0000 (12:00 -0700)]
[SCSI] qla2xxx: Code changes to support new dynamic logging infrastructure.
The code is changed to support the new dynamic logging infrastructure.
Following are the levels added.
Default is 0 - no logging. 0x40000000 - Module Init & Probe.
0x20000000 - Mailbox Cmnds. 0x10000000 - Device Discovery.
0x08000000 - IO tracing. 0x04000000 - DPC Thread.
0x02000000 - Async events. 0x01000000 - Timer routines.
0x00800000 - User space. 0x00400000 - Task Management.
0x00200000 - AER/EEH. 0x00100000 - Multi Q.
0x00080000 - P3P Specific. 0x00040000 - Virtual Port.
0x00020000 - Buffer Dump. 0x00010000 - Misc.
0x7fffffff - For enabling all logs, can be too many logs.
Setting ql2xextended_error_logging module parameter to any of the above
value, will enable the debug for that particular level.
Do LOGICAL OR of the value to enable more than one level.
Signed-off-by: Saurav Kashyap <saurav.kashyap@qlogic.com>
Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: Joe Carnuccio <joe.carnuccio@qlogic.com>
Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com>
Signed-off-by: Madhuranath Iyengar <Madhu.Iyengar@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Saurav Kashyap [Thu, 14 Jul 2011 19:00:12 +0000 (12:00 -0700)]
[SCSI] qla2xxx: Basic infrastructure for dynamic logging.
This patch adds the dynamic logging framework to the qla2xxx driver.
The user will be able to change the logging levels on the fly i.e.
without load/unload of the driver. This also enables logging to be
enabled for a particular section of the driver such as initialization,
device discovery etc.
Signed-off-by: Saurav Kashyap <saurav.kashyap@qlogic.com>
Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
James Smart [Wed, 6 Jul 2011 16:45:17 +0000 (12:45 -0400)]
[SCSI] scsi_lib: pause between error retries
During cable pull tests on our 16G FC adapter, we are seeing errors,
typically reads to close targets, which fail due to CRC or framing
errors caused by the cable being pull (return status DID_ERROR).
The adapter detects the error on one of the first frames received,
marks the FC exchange as dead (further frames go to bit bucket) and
signals the host of the error. This action is so quick, and coupled
with fast host CPUs, creates a scenario in which the midlayer sees
the failure and retries the io almost immediately. We've seen link
traces with the retry on the link while the original i/o is still
being processed by the target. We're also seeing the time window
for the "link to pull-apart" and the physical interface to report
disconnected to be in the few millisecond range. Which means, we're
encountering scenarios where the full retry count is exhausted
(all with error) by the midlayer before the link disconnect state
is detected.
We looked at 8G FC behavior and occasionally see the same behavior,
but as the link was slower, it rarely could exhaust all retries
before the link reported disconnect.
What is needed is a slight delay between io retries due to DID_ERROR
to cover this error. It is inappropriate to put this delay in the
driver, as the error is indistinguishable from other link-related errors,
nor does the driver track whether the io is a retry or not. This is also
easier than tracking between-io-error bursts that are seen in this
scenario.
The patch below updates the retry path so that it inserts a delay as
if the target was busy. The busy delay is on the order of 6ms. This
delay is sufficient to ensure the link down condition is reported
before the retry count is exhausted (at most 1 retry is seen).
Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Kashyap, Desai [Tue, 5 Jul 2011 07:10:23 +0000 (12:40 +0530)]
[SCSI] mpt2sas: WarpDrive Infinite command retries due to wrong scsi command entry in MPI message
Issue:
This issue is seen on LSI H/W WarpDrive SSS6200 When filed direct I/O
is tried as volume I/O the scmd field in internal lookup table get
cleared and because of that the retried volume I/O never gets reported
as completed to SML.
Result:
I/O timeout and Error handling thread will kicking off
Fix:
Setting back the scmd in the lookup table before retrying the failed
direct i/o
Signed-off-by: Kashyap Desai <kashyap.desai@lsi.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Bhanu Prakash Gollapudi [Tue, 28 Jun 2011 06:30:53 +0000 (23:30 -0700)]
[SCSI] bnx2fc: Replace printks with KERN_ALERT to KERN_ERR/KERN_INFO
Signed-off-by: Bhanu Prakash Gollapudi <bprakash@broadcom.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Hannes Reinecke [Wed, 29 Jun 2011 20:49:57 +0000 (02:19 +0530)]
[SCSI] scsi_transport_spi: Export host width and HBA id
Currently it's impossible to find out if the host supports
wide SCSI unless you're committed to trawl through syslog.
And it's near impossible to find the actual HBA id, which
is settable for some SCSI HBAs (like aic7xxx).
So export them via sysfs.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ankit Jain <jankit@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Xiangliang Yu [Thu, 30 Jun 2011 14:27:36 +0000 (22:27 +0800)]
[SCSI] mvsas: Add support for interrupt tasklet
Add support for interrupt tasklet, which will improve performance.
Correct spelling of "20011"
[jejb: simplified ifdefs and fixed unused variable problem]
Signed-off-by: Xiangliang Yu <yuxiangl@marvell.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Xiangliang Yu [Tue, 24 May 2011 14:38:10 +0000 (22:38 +0800)]
[SCSI] mvsas: update comments
Remove obsolete comments and add new comments
Signed-off-by: Xiangliang Yu <yuxiangl@marvell.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Xiangliang Yu [Tue, 24 May 2011 14:37:25 +0000 (22:37 +0800)]
[SCSI] mvsas: misc improvements
Change code to match HBA datasheet.
Change code to make it readable.
Add support big endian for mvs_prd_imt.
Add cpu_to_le32 and cpu_to_le64 to use on addr.
Add scan_finished for structure mvs_prv_info.
Signed-off-by: Xiangliang Yu <yuxiangl@marvell.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Xiangliang Yu [Tue, 24 May 2011 14:36:02 +0000 (22:36 +0800)]
[SCSI] mvsas: Add new macros and functions
Add new macros: MVS_SOFT_RESET, MVS_HARD_RESET, MVS_PHY_TUNE,
MVS_COMMAND_ACTIVE, EXP_BRCT_CHG, MVS_MAX_SG
Add new member sg_width in struct mvs_chip_info
Use macros rather than magic number
Add new functions: mvs_fill_ssp_resp_iu, mvs_set_sense,
mvs_94xx_clear_srs_irq, mvs_94xx_phy_set_link_rate
Signed-off-by: Xiangliang Yu <yuxiangl@marvell.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Xiangliang Yu [Tue, 24 May 2011 14:35:09 +0000 (22:35 +0800)]
[SCSI] mvsas: Remove unused macros, variables and functions
Remove unused macros: VSR_PHY_VS0, VSR_PHY_VS1, MVS_SLOTS,
MVS_CAN_QUEUE, MVS_MSI, SG_MX, _MV_DUMP, MV_DISABLE_NCQ
Remove unused variables for mvs_info: irq, exp_req, cmd_size
Remove unused functions: mvs_get_sas_addr, mvs_hexdump,
mvs_hba_sb_dump, mvs_hab_memory_dump, mvs_hba_cq_dump
Signed-off-by: Xiangliang Yu <yuxiangl@marvell.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Xiangliang Yu [Tue, 24 May 2011 14:33:11 +0000 (22:33 +0800)]
[SCSI] mvsas: fix 94xx hotplug issue
Fix 94xx A0/B0 revision hotplug issue.
Remove unused macro: DISABLE_HOTPLUG_DMA_FIX
Signed-off-by: Xiangliang Yu <yuxiangl@marvell.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Xiangliang Yu [Tue, 24 May 2011 14:31:47 +0000 (22:31 +0800)]
[SCSI] mvsas: Add driver version and interrupt coalescing to device attributes in sysfs
Signed-off-by: Xiangliang Yu <yuxiangl@marvell.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Xiangliang Yu [Tue, 24 May 2011 14:28:31 +0000 (22:28 +0800)]
[SCSI] mvsas: add support for 94xx phy tuning and multiple revisions
Add 94xx phy tuning to aid manufacturing.
Add support for 94xx multiple revisions: A0, B0, C0, C1, C2.
Signed-off-by: Xiangliang Yu <yuxiangl@marvell.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Xiangliang Yu [Tue, 24 May 2011 14:26:50 +0000 (22:26 +0800)]
[SCSI] mvsas: Add support for Non specific NCQ error interrupt
Signed-off-by: Xiangliang Yu <yuxiangl@marvell.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Linus Torvalds [Tue, 26 Jul 2011 04:00:19 +0000 (21:00 -0700)]
Merge 'akpm' patch series
* Merge akpm patch series: (122 commits)
drivers/connector/cn_proc.c: remove unused local
Documentation/SubmitChecklist: add RCU debug config options
reiserfs: use hweight_long()
reiserfs: use proper little-endian bitops
pnpacpi: register disabled resources
drivers/rtc/rtc-tegra.c: properly initialize spinlock
drivers/rtc/rtc-twl.c: check return value of twl_rtc_write_u8() in twl_rtc_set_time()
drivers/rtc: add support for Qualcomm PMIC8xxx RTC
drivers/rtc/rtc-s3c.c: support clock gating
drivers/rtc/rtc-mpc5121.c: add support for RTC on MPC5200
init: skip calibration delay if previously done
misc/eeprom: add eeprom access driver for digsy_mtc board
misc/eeprom: add driver for microwire 93xx46 EEPROMs
checkpatch.pl: update $logFunctions
checkpatch: make utf-8 test --strict
checkpatch.pl: add ability to ignore various messages
checkpatch: add a "prefer __aligned" check
checkpatch: validate signature styles and To: and Cc: lines
checkpatch: add __rcu as a sparse modifier
checkpatch: suggest using min_t or max_t
...
Did this as a merge because of (trivial) conflicts in
- Documentation/feature-removal-schedule.txt
- arch/xtensa/include/asm/uaccess.h
that were just easier to fix up in the merge than in the patch series.
Andrew Morton [Tue, 26 Jul 2011 00:13:39 +0000 (17:13 -0700)]
drivers/connector/cn_proc.c: remove unused local
Fix the warning
drivers/connector/cn_proc.c: In function 'proc_ptrace_connector':
drivers/connector/cn_proc.c:176: warning: unused variable 'tracer'
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paul E. McKenney [Tue, 26 Jul 2011 00:13:38 +0000 (17:13 -0700)]
Documentation/SubmitChecklist: add RCU debug config options
There have been persistent lockdep RCU splats, indicating that submitters
are not testing with CONFIG_PROVE_RCU. Add this config option to the list
in Documentation/SubmitChecklist. Also add CONFIG_DEBUG_OBJECTS_RCU_HEAD
for good measure.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Tue, 26 Jul 2011 00:13:38 +0000 (17:13 -0700)]
reiserfs: use hweight_long()
Use hweight_long() to count free bits in the bitmap.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Akinobu Mita [Tue, 26 Jul 2011 00:13:37 +0000 (17:13 -0700)]
reiserfs: use proper little-endian bitops
Using __test_and_{set,clear}_bit_le() with ignoring its return value can
be replaced with __{set,clear}_bit_le().
This introduces reiserfs_{set,clear}_le_bit for __{set,clear}_bit_le and
does the above change with them.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Witold Szczeponik [Tue, 26 Jul 2011 00:13:36 +0000 (17:13 -0700)]
pnpacpi: register disabled resources
When parsing PnP ACPI resource structures, it may happen that some of
the resources are disabled (in which case "the size" of the resource
equals zero).
The current solution is to skip these resources completely - with the
unfortunate side effect that they are not registered despite the fact
that they exist, after all. (The downside of this approach is that
these resources cannot be used as templates for setting the actual
device's resources because they are missing from the template.) The
kernel's APM implementation does not suffer from this problem and
registers all resources regardless of "their size".
This patch fixes a problem with (at least) the vintage IBM ThinkPad 600E
(and most likely also with the 600, 600X, and 770X which have a very
similar layout) where some of its PnP devices support options where
either an IRQ, a DMA, or an IO port is disabled. Without this patch,
the devices can not be configured using the
"/sys/bus/pnp/devices/*/resources" interface.
The manipulation of these resources is important because the 600E has
very demanding requirements. For instance, the number of IRQs is not
sufficient to support all devices of the 600E. Fortunately, some of the
devices, like the sound card's MPU-401 UART, can be configured to not
use any IRQ, hence freeing an IRQ for a device that requires one.
(Still, the device's "ResourceTemplate" requires an IRQ resource
descriptor which cannot be created if the resource has not been
registered in the first place.)
As an example, the dependent sets of the 600E's CSC0103 device (the
MPU-401 UART) are listed, with the patch applied, as:
Dependent: 00 - Priority preferred
port 0x300-0x330, align 0xf, size 0x4, 16-bit address decoding
irq <none> High-Edge
Dependent: 01 - Priority acceptable
port 0x300-0x330, align 0xf, size 0x4, 16-bit address decoding
irq 5,7,2/9,10,11,15 High-Edge
(The same result is obtained when PNPBIOS is used instead of PnP ACPI.)
Without the patch, the IRQ resource in the preferred option is not
listed at all:
Dependent: 00 - Priority preferred
port 0x300-0x330, align 0xf, size 0x4, 16-bit address decoding
Dependent: 01 - Priority acceptable
port 0x300-0x330, align 0xf, size 0x4, 16-bit address decoding
irq 5,7,2/9,10,11,15 High-Edge
And in fact, the 600E's DSDT lists the disabled IRQ as an option, as can
be seen from the following excerpt from the DSDT:
Name (_PRS, ResourceTemplate ()
{
StartDependentFn (0x00, 0x00)
{
IO (Decode16, 0x0300, 0x0330, 0x10, 0x04)
IRQNoFlags () {}
}
StartDependentFn (0x01, 0x00)
{
IO (Decode16, 0x0300, 0x0330, 0x10, 0x04)
IRQNoFlags () {5,7,9,10,11,15}
}
EndDependentFn ()
})
With this patch applied, a user space program - or maybe even the kernel
- can allocate all devices' resources optimally. For the 600E, this
means to find optimal resources for (at least) the serial port, the
parallel port, the infrared port, the MWAVE modem, the sound card, and
the MPU-401 UART.
The patch applies the idea to register disabled resources to all types
of resources, not just to IRQs, DMAs, and IO ports. At the same time,
it mimics the behavior of the "pnp_assign_xxx" functions from
"drivers/pnp/manager.c" where resources with "no size" are considered
disabled.
No regressions were observed on hardware that does not require this
patch.
The patch is applied against 2.6.39.
NB: The kernel's current PnP interface does not allow for disabling individual
resources using the "/sys/bus/pnp/devices/$device/resources" file. Assuming
this could be done, a device could be configured to use a disabled resource
using a simple series of calls:
echo disable > /sys/bus/pnp/devices/$device/resources
echo clear > /sys/bus/pnp/devices/$device/resources
echo set irq disabled > /sys/bus/pnp/devices/$device/resources
echo fill > /sys/bus/pnp/devices/$device/resources
echo activate > /sys/bus/pnp/devices/$device/resources
This patch addresses only the parsing of PnP ACPI devices.
ChangeLog (v1 -> v2):
- extend patch description
- fix typo in patch itself
Signed-off-by: Witold Szczeponik <Witold.Szczeponik@gmx.net>
Cc: Len Brown <lenb@kernel.org>
Cc: Adam Belay <abelay@mit.edu>
Cc: Bjorn Helgaas <bjorn.helgaas@hp.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Cc: Matthew Garrett <mjg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Uwe Kleine-König [Tue, 26 Jul 2011 00:13:34 +0000 (17:13 -0700)]
drivers/rtc/rtc-tegra.c: properly initialize spinlock
Using __SPIN_LOCK_UNLOCKED for a dynamically allocated lock is wrong and
breaks the build with PREEMPT_RT_FULL.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Andrew Chew <achew@nvidia.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jesper Juhl [Tue, 26 Jul 2011 00:13:34 +0000 (17:13 -0700)]
drivers/rtc/rtc-twl.c: check return value of twl_rtc_write_u8() in twl_rtc_set_time()
We forget to save the return value of the call to
twl_rtc_write_u8(save_control, REG_RTC_CTRL_REG); in 'ret', making the
test of 'ret < 0' dead code since 'ret' then couldn't possibly have
changed since the last test just a few lines above. It also makes us not
detect failures from that specific twl_rtc_write_u8() call.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Alexandre Rusev <source@mvista.com>
Cc: "George G. Davis" <gdavis@mvista.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anirudh Ghayal [Tue, 26 Jul 2011 00:13:33 +0000 (17:13 -0700)]
drivers/rtc: add support for Qualcomm PMIC8xxx RTC
Add support for PMIC8xxx based RTC. PMIC8xxx is Qualcomm's power
management IC that internally houses an RTC module. This driver
communicates with the PMIC module over SSBI bus.
[akpm@linux-foundation.org: cosmetic tweaks]
Acked-by: Wan ZongShun <mcuos.com@gmail.com>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Anirudh Ghayal <aghayal@codeaurora.org>
Signed-off-by: Ashay Jaiswal <ashayj@codeaurora.org>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: Wan ZongShun <mcuos.com@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Donggeun Kim [Tue, 26 Jul 2011 00:13:32 +0000 (17:13 -0700)]
drivers/rtc/rtc-s3c.c: support clock gating
Add support for clock gating. Power consumption can be reduced by setting
rtc_clk disabled state except for when RTC related registers are accessed.
Signed-off-by: Donggeun Kim <dg77.kim@samsung.com>
Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: KyungMin Park <kyungmin.park@samsung.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Ben Dooks <ben@fluff.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Dmitry Eremin-Solenikov [Tue, 26 Jul 2011 00:13:30 +0000 (17:13 -0700)]
drivers/rtc/rtc-mpc5121.c: add support for RTC on MPC5200
MPC5200B contains a limited version of RTC from MPC5121. Add support for
the RTC on that CPU.
Signed-off-by: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: Anatolij Gustschin <agust@denx.de>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sameer Nanda [Tue, 26 Jul 2011 00:13:29 +0000 (17:13 -0700)]
init: skip calibration delay if previously done
For each CPU, do the calibration delay only once. For subsequent calls,
use the cached per-CPU value of loops_per_jiffy.
This saves about 200ms of resume time on dual core Intel Atom N5xx based
systems. This helps bring down the kernel resume time on such systems
from about 500ms to about 300ms.
[akpm@linux-foundation.org: make cpu_loops_per_jiffy static]
[akpm@linux-foundation.org: clean up message text]
[akpm@linux-foundation.org: fix things up after upstream rmk changes]
Signed-off-by: Sameer Nanda <snanda@chromium.org>
Cc: Phil Carmody <ext-phil.2.carmody@nokia.com>
Cc: Andrew Worsley <amworsley@gmail.com>
Cc: David Daney <ddaney@caviumnetworks.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anatolij Gustschin [Tue, 26 Jul 2011 00:13:29 +0000 (17:13 -0700)]
misc/eeprom: add eeprom access driver for digsy_mtc board
Both displays on digsy_mtc board obtain their configuration from microwire
EEPROMs which are connected to the SoC over GPIO lines. We need an easy
way to access the EEPROMs to write the needed display configuration or to
read out the currently programmed configuration. The generic
eeprom_93xx46 SPI driver added by previous patch allows EEPROM access over
sysfs. Using the simple driver added by this patch we provide used GPIO
interface and access control description on the board for generic
eeprom_93xx46 driver and spi_gpio driver.
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Cc: Jonathan Cameron <jic23@cam.ac.uk>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Anatolij Gustschin [Tue, 26 Jul 2011 00:13:27 +0000 (17:13 -0700)]
misc/eeprom: add driver for microwire 93xx46 EEPROMs
Add EEPROM driver for 93xx46 chips. It can also be used with spi_gpio
driver to access 93xx46 EEPROMs connected over GPIO lines. This driver
supports read/write/erase access to the EEPROM chips over sysfs files.
[rdunlap@xenotime.net: fix printk format]
Signed-off-by: Anatolij Gustschin <agust@denx.de>
Cc: Jonathan Cameron <jic23@cam.ac.uk>
Cc: Grant Likely <grant.likely@secretlab.ca>
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 26 Jul 2011 00:13:27 +0000 (17:13 -0700)]
checkpatch.pl: update $logFunctions
Previous behavior allowed only alphabetic prefixes like pr_info to exceed
the 80 column line length limit.
ath6kl wants to add a digit into the prefix, so allow numbers as well as
digits in the <prefix>_<level> printks.
<prefix>_<level>_ratelimited and <prefix>_<level>_once and WARN_RATELIMIT
and WARN_ONCE may now exceed 80 cols.
Add missing <prefix>_printk type for completeness.
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Kalle Valo <kvalo@qca.qualcomm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 26 Jul 2011 00:13:26 +0000 (17:13 -0700)]
checkpatch: make utf-8 test --strict
Some patches are sent in using ISO-8859 or even Windows codepage 1252.
Make checkpatch accept these by default and only emit the "Invalid UTF-8"
message when using --strict.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 26 Jul 2011 00:13:25 +0000 (17:13 -0700)]
checkpatch.pl: add ability to ignore various messages
Some users would like the ability to not emit some of the messages that
checkpatch produces. This can make it easier to use checkpatch in other
projects and integrate into scm hook scripts.
Add command line option to "--ignore" various message types. Add option
--show-types to emit the "type" of each message. Categorize all ERROR,
WARN and CHK messages with types.
Add optional .checkpatch.conf file to store default options.
3 paths are searched for .checkpatch.conf
. customized per-tree configurations
$HOME user global configuration when per-tree configs don't exist
./scripts lk defaults to override script
The .conf file can contain any valid command-line argument and
the contents are prepended to any additional command line arguments.
Multiple lines may be used, blank lines are ignored, # is a comment.
Update "false positive" output for readability.
Update version to 0.32
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Cc: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 26 Jul 2011 00:13:24 +0000 (17:13 -0700)]
checkpatch: add a "prefer __aligned" check
Prefer the use of __aligned(size) over __attribute__((__aligned___(size)))
Link: http://lkml.kernel.org/r/20110609094526.1571774c.akpm@linux-foundation.org
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 26 Jul 2011 00:13:23 +0000 (17:13 -0700)]
checkpatch: validate signature styles and To: and Cc: lines
Signatures have many forms and can sometimes cause problems if not in the
correct format when using git send-email or quilt.
Try to verify the signature tags and email addresses to use the generally
accepted "Signed-off-by: Full Name <email@domain.tld>" form.
Original idea by Anish Kumar <anish198519851985@gmail.com>
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Anish Kumar <anish198519851985@gmail.com>
Cc: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Sven Eckelmann [Tue, 26 Jul 2011 00:13:23 +0000 (17:13 -0700)]
checkpatch: add __rcu as a sparse modifier
Fix "need consistent spacing around '*'" error after a __rcu sparse
annotation which was caused by the missing __rcu entry in the
checkpatch.pl internal list of sparse keywords.
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 26 Jul 2011 00:13:22 +0000 (17:13 -0700)]
checkpatch: suggest using min_t or max_t
A common issue with min() or max() is using a cast on one or both of the
arguments when using min_t/max_t could be better.
Add cast detection to uses of min/max and suggest an appropriate use of
min_t or max_t instead.
Caveat: This only works for min() or max() on a single line.
It does not find min() or max() split across multiple lines.
This does find:
min((u32)foo, bar);
But it does not find:
max((unsigned long)foo,
bar);
Suggested-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Tue, 26 Jul 2011 00:13:21 +0000 (17:13 -0700)]
drivers/firmware/sigma.c needs MODULE_LICENSE
Fix module tainting message:
sigma: module license 'unspecified' taints kernel.
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Andy Shevchenko [Tue, 26 Jul 2011 00:13:20 +0000 (17:13 -0700)]
lib: make _tolower() public
This function is required by *printf and kstrto* functions that are
located in the different modules. This patch makes _tolower() public.
However, it's good idea to not use the helper outside of mentioned
functions.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
H Hartley Sweeten [Tue, 26 Jul 2011 00:13:20 +0000 (17:13 -0700)]
lib/lcm.c: quiet sparse noise
The symbol 'lcm' is exported to the kernel (EXPORT_SYMBOL_GPL).
Pick up it's definition in <linux/lcm.h> to quiet the sparse noise:
warning: symbol 'lcm' was not declared. Should it be static?
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Walleij [Tue, 26 Jul 2011 00:13:18 +0000 (17:13 -0700)]
mach-ux500: add lm3530 ALS platform data for U5500
From: Shreshtha Kumar Sahu <shreshthakumar.sahu@stericsson.com>
platform data for simple backlight driver for LM3530
in the u5500 platform
Signed-off-by: Shreshtha Kumar Sahu <shreshthakumar.sahu@stericsson.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Lee Jones <lee.jones@linaro.org>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Shreshtha Kumar Sahu [Tue, 26 Jul 2011 00:13:17 +0000 (17:13 -0700)]
arch/arm/mach-ux500/board-u5500.c: calibrate ALS input voltage
Provide the support for auto calibration of ALS Zone boundaries based on
min/max ALS input voltage.
Signed-off-by: Shreshtha Kumar Sahu <shreshthakumar.sahu@stericsson.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Cc: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Axel Lin [Tue, 26 Jul 2011 00:13:16 +0000 (17:13 -0700)]
drivers/leds/leds-netxbig: make LEDS_NETXBIG depend on LEDS_CLASS
We call led_classdev_register/led_classdev_unregister in
create_netxbig_led/delete_netxbig_led, thus make LEDS_NETXBIG depend on
LEDS_CLASS.
This patch fixes below build error if LEDS_CLASS is not configured.
LD .tmp_vmlinux1
drivers/built-in.o: In function `create_netxbig_led':
drivers/leds/leds-netxbig.c:350: undefined reference to `led_classdev_register'
drivers/leds/leds-netxbig.c:361: undefined reference to `led_classdev_unregister'
drivers/built-in.o: In function `delete_netxbig_led':
drivers/leds/leds-netxbig.c:313: undefined reference to `led_classdev_unregister'
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Acked-by: Simon Guinot <sguinot@lacie.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Axel Lin [Tue, 26 Jul 2011 00:13:16 +0000 (17:13 -0700)]
drivers/leds/leds-sunfire.c: fix sunfire_led_generic_probe() error handling
- return -ENOMEM if kzalloc fails, rather than the current -EINVAL
- fix a memory leak in the case of goto out_unregister_led_cdevs
Signed-off-by: Axel Lin <axel.lin@gmail.com>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Walleij [Tue, 26 Jul 2011 00:13:15 +0000 (17:13 -0700)]
drivers/leds/leds-lp5521.c: provide section tagging
Tag the and remove() function as __devexit respectively.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Cc: Richard Purdie <rpurdie@rpsys.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 26 Jul 2011 00:13:14 +0000 (17:13 -0700)]
MAINTAINERS: update HIGH RESOLUTION TIMERS patterns
clockchips.h was typoed as clockevents.h
Signed-off-by: Joe Perches <joe@perches.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joe Perches [Tue, 26 Jul 2011 00:13:13 +0000 (17:13 -0700)]
get_maintainers.pl: improve .mailmap parsing
Entries that used formats other than "Proper Name <commit@email.xx>"
were not parsed properly.
Try to improve the parsing so that the entries in the forms of:
Proper Name <proper@email.xx> <commit@email.xx>
and
Proper Name <proper@email.xx> Commit Name <commit@email.xx>
are transformed correctly.
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Florian Mickler <florian@mickler.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Stephen Boyd [Tue, 26 Jul 2011 00:13:12 +0000 (17:13 -0700)]
kernel/configs.c: include MODULE_*() when CONFIG_IKCONFIG_PROC=n
If CONFIG_IKCONFIG=m but CONFIG_IKCONFIG_PROC=n we get a module that has
no MODULE_LICENSE definition. Move the MODULE_*() definitions outside the
CONFIG_IKCONFIG_PROC #ifdef to prevent this configuration from tainting
the kernel.
Signed-off-by: Stephen Boyd <bebarino@gmail.com>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Amerigo Wang [Tue, 26 Jul 2011 00:13:12 +0000 (17:13 -0700)]
notifiers: vt: move vt notifiers into vt.h
It is not necessary to share the same notifier.h.
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: David Miller <davem@davemloft.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Amerigo Wang [Tue, 26 Jul 2011 00:13:11 +0000 (17:13 -0700)]
notifiers: pm: move pm notifiers into suspend.h
It is not necessary to share the same notifier.h.
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: David Miller <davem@davemloft.net>
Acked-by: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Amerigo Wang [Tue, 26 Jul 2011 00:13:10 +0000 (17:13 -0700)]
notifiers: sys: move reboot notifiers into reboot.h
It is not necessary to share the same notifier.h.
This patch already moves register_reboot_notifier() and
unregister_reboot_notifier() from kernel/notifier.c to kernel/sys.c.
[amwang@redhat.com: make allyesconfig succeed on ppc64]
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: David Miller <davem@davemloft.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Amerigo Wang [Tue, 26 Jul 2011 00:13:09 +0000 (17:13 -0700)]
notifiers: net: move netdevice notifiers into netdevice.h
It is not necessary to share the same notifier.h.
Signed-off-by: WANG Cong <amwang@redhat.com>
Acked-by: David Miller <davem@davemloft.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Amerigo Wang [Tue, 26 Jul 2011 00:13:08 +0000 (17:13 -0700)]
notifiers: cpu: move cpu notifiers into cpu.h
We presently define all kinds of notifiers in notifier.h. This is not
necessary at all, since different subsystems use different notifiers, they
are almost non-related with each other.
This can also save much build time. Suppose I add a new netdevice event,
really I don't have to recompile all the source, just network related.
Without this patch, all the source will be recompiled.
I move the notify events near to their subsystem notifier registers, so
that they can be found more easily.
This patch:
It is not necessary to share the same notifier.h.
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: David Miller <davem@davemloft.net>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Donggeun Kim [Tue, 26 Jul 2011 00:13:07 +0000 (17:13 -0700)]
drivers/misc: add support the FSA9480 USB Switch
The FSA9480 is a USB port accessory detector and switch. This patch adds
support the FSA9480 USB Switch.
[akpm@linux-foundation.org: make a couple of things static]
Signed-off-by: Donggeun Kim <dg77.kim@samsung.com>
Signed-off-by: Minkyu Kang <mk7.kang@samsung.com>
Signed-off-by: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Philip A. Prindeville [Tue, 26 Jul 2011 00:13:05 +0000 (17:13 -0700)]
geode: reflect mfgpt dependency on mfd
As stated in drivers/mfd/cs5535-mfd.c, the mfd driver exposes the BARs
which then make the GPIO, MFGPT, ACPI, etc. all visible to the system.
So the dependencies of the MFGPT stuff have changed, and most people
expect Kconfig to bring in the necessary dependencies. Without them, the
module fails to load and most people don't understand why because the
details of the rewrite aren't captured anywhere most people who know to
look.
This dependency needs to be reflected in Kconfig.
Signed-off-by: Philip A. Prindeville <philipp@redfish-solutions.com>
Acked-by: Alexandros C. Couloumbis <alex@ozo.com>
Acked-by: Andres Salomon <dilinger@queued.net>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnaud Lacombe [Tue, 26 Jul 2011 00:13:04 +0000 (17:13 -0700)]
eisa/pci_eisa.c: fix section mismatch
Fixes
WARNING: vmlinux.o(.data+0x15d3ac): Section mismatch in reference from the variable pci_eisa_driver to the function .init.text:pci_eisa_init()
The variable pci_eisa_driver references the function __init pci_eisa_init()
If the reference is valid then annotate the variable with __init* or __refdata (see linux/init.h) or name the variable:
*_template, *_timer, *_sht, *_ops, *_probe, *_probe_one, *_console
Signed-off-by: Arnaud Lacombe <lacombar@gmail.com>
Cc: Greg KH <greg@kroah.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Arnaud Lacombe [Tue, 26 Jul 2011 00:13:03 +0000 (17:13 -0700)]
include/linux/kernel.h: hide internal macros from userspace
Unexpose to userland the following macros
- __FUNCTION__
- NUMA_BUILD
- COMPACTION_BUILD
- REBUILD_DUE_TO_FTRACE_MCOUNT_RECORD
Signed-off-by: Arnaud Lacombe <lacombar@gmail.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Steven Rostedt <rostedt@goodmis.org>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
WANG Cong [Tue, 26 Jul 2011 00:13:02 +0000 (17:13 -0700)]
include/linux/kernel.h: fix a headers_check warning
Fix the warning:
usr/include/linux/kernel.h:65: userspace cannot reference function or variable defined in the kernel
As Michal noted, BUILD_BUG_ON stuffs should be moved
under #ifdef __KERNEL__.
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Michal Marek <mmarek@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Uwe Kleine-König [Tue, 26 Jul 2011 00:13:00 +0000 (17:13 -0700)]
include/linux/ioport.h: new helper to define common struct resource constructs
Resource definitions that just define start, end and flags =
IORESOURCE_MEM or IORESOURCE_IRQ (with start=end) are quite common. So
introduce a shortcut for them. For completeness add macros for
IORESOURCE_DMA and IORESOURCE_IO, too and also make available a set of
macros to specify named resources of all types which are less common.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Maxin B John [Tue, 26 Jul 2011 00:12:59 +0000 (17:12 -0700)]
devres: fix possible use after free
devres uses the pointer value as key after it's freed, which is safe but
triggers spurious use-after-free warnings on some static analysis tools.
Rearrange code to avoid such warnings.
Signed-off-by: Maxin B. John <maxin.john@gmail.com>
Reviewed-by: Rolf Eike Beer <eike-kernel@sf-tec.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Frysinger [Tue, 26 Jul 2011 00:12:58 +0000 (17:12 -0700)]
asm-generic/system.h: drop useless __KERNEL__
This header isn't exported to user-space, and even if it was, the
__KERNEL__ check covers the entire file, so we'd get a useless stub in the
first place. So punt it.
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Rakib Mullick [Tue, 26 Jul 2011 00:12:56 +0000 (17:12 -0700)]
drivers: use kzalloc/kcalloc instead of 'kmalloc+memset', where possible
Signed-off-by: Rakib Mullick <rakib.mullick@gmail.com>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: David Airlie <airlied@linux.ie>
Cc: Tejun Heo <tj@kernel.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard Weinberger [Tue, 26 Jul 2011 00:12:55 +0000 (17:12 -0700)]
um: remove dead code
GCC 4.6's -Wunused-but-set-variable found some dead code.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard Weinberger [Tue, 26 Jul 2011 00:12:55 +0000 (17:12 -0700)]
um: adjust size of pid_buf
Linux can have pids up to 4*1024*1024. To handle such huge numbers
pid_buf needs to be larger.
Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard Weinberger [Tue, 26 Jul 2011 00:12:54 +0000 (17:12 -0700)]
um: implement a x86_64 vDSO
Until now UML had no x86_64 vDSO. So glibc always used the vsyscall page
for gettimeday() and friends. Calls to gettimeday() returned falsely the
host time and confused some programs.
This patch adds a vDSO which turns all __vdso_* calls into a system call
so that UML can trap them.
As glibc still uses the vsyscall page for static binaries this patch
improves the situation only for dynamic binaries.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard Weinberger [Tue, 26 Jul 2011 00:12:54 +0000 (17:12 -0700)]
um: set __HAVE_ARCH_GATE_AREA for x86_64
Implement arch_vma_name() and make get_gate_vma(), in_gate_area() and
in_gate_area_no_mm() a nop.
We need arch_vma_name() to support vDSO.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard Weinberger [Tue, 26 Jul 2011 00:12:53 +0000 (17:12 -0700)]
um: Set __HAVE_ARCH_GATE_AREA for i386
When UML is unable to reuse the host's vDSO FIXADDR_USER_START is zero.
To handle this special case correclty we have to implement custom gate
area helper methods.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard Weinberger [Tue, 26 Jul 2011 00:12:52 +0000 (17:12 -0700)]
um: disable scan_elf_aux() on x86_64
Reusing the host's vDSO makes only sense on x86_32.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Davidlohr Bueso [Tue, 26 Jul 2011 00:12:52 +0000 (17:12 -0700)]
uml: free resources
When creating the temp file there's a memory and file descriptor leak upon
error.
Signed-off-by: Davidlohr Bueso <dave@gnu.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Reviewed-by: Vitaliy Ivanov <vitalivanov@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vitaliy Ivanov [Tue, 26 Jul 2011 00:12:51 +0000 (17:12 -0700)]
uml: drivers/slip_user.c memory leak fix
Do not free memory when you failed to allocate it.
Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vitaliy Ivanov [Tue, 26 Jul 2011 00:12:50 +0000 (17:12 -0700)]
uml: helper.c warning corrections
Fix this warning:
arch/um/os-Linux/helper.c: In function `helper_child':
arch/um/os-Linux/helper.c:38:7: warning: ignoring return value of `write', declared with attribute warn_unused_result
[richard@nod.at: happens only with -D_FORTIFY_SOURCE=2]
Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vitaliy Ivanov [Tue, 26 Jul 2011 00:12:50 +0000 (17:12 -0700)]
uml: cow_user.c warning corrections
Fix this warning:
arch/um/drivers/cow_user.c: In function `absolutize':
arch/um/drivers/cow_user.c:189:7: warning: ignoring return value of `chdir', declared with attribute warn_unused_result
[richard@nod.at: happens only with -D_FORTIFY_SOURCE=2]
Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Vitaliy Ivanov [Tue, 26 Jul 2011 00:12:49 +0000 (17:12 -0700)]
uml: drivers/net_user.c memory leak fix
Perform memory cleanup on exit. On receiving invalid 'pid' we still
should clean 'output' variable.
Signed-off-by: Vitaliy Ivanov <vitalivanov@gmail.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Geert Uytterhoeven [Tue, 26 Jul 2011 00:12:48 +0000 (17:12 -0700)]
um: reinstate kernel version in generated .config
Commit
0954828fcbf3 ("kconfig: replace KERNELVERSION usage by the
mainmenu's prompt") made the kernel version disappear from the generated
.config file when configuring for UML. As UML's Kconfig doesn't have a
mainmenu, kconfig falls back to the default string "Linux Kernel
Configuration".
Add a suitable mainmenu to the main UML Kconfig file to fix this.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard Weinberger [Tue, 26 Jul 2011 00:12:48 +0000 (17:12 -0700)]
um: add netpoll support
To make netconsole usable on UML, its ethernet driver needs netpoll
support.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard Weinberger [Tue, 26 Jul 2011 00:12:47 +0000 (17:12 -0700)]
um: fix _FORTIFY_SOURCE=2 support for kernel modules
When UML is compiled with _FORTIFY_SOURCE we have to export all _chk()
functions which are used in modules. For now it's only the case for
__sprintf_chk().
Tested-by: Florian Fainelli <florian@openwrt.org>
Reported-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Vitaliy Ivanov <vitalivanov@gmail.com>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard Weinberger [Tue, 26 Jul 2011 00:12:46 +0000 (17:12 -0700)]
um: clean up delay functions
Both sys-i386 and sys-x86_64 support now ndelay(). The delay functions
are based on arch/x86/lib/delay.c.
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mathias Krause [Tue, 26 Jul 2011 00:12:45 +0000 (17:12 -0700)]
um, exec: remove redundant set_fs(USER_DS)
The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Richard Weinberger <richard@nod.at>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Richard Weinberger [Tue, 26 Jul 2011 00:12:44 +0000 (17:12 -0700)]
um: clean up vm-flags.h
There is no need to define VM_{STACK,DATA}_DEFAULT_FLAGS as variable.
It's also useless to test for TIF_IA32 as 64bit UML has no IA32 emulation.
Signed-off-by: Richard Weinberger <richard@nod.at>
Acked-by: Randy Dunlap <rdunlap@xenotime.net>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mathias Krause [Tue, 26 Jul 2011 00:12:44 +0000 (17:12 -0700)]
cris, exec: remove redundant set_fs(USER_DS)
The address limit is already set in flush_old_exec() so those calls to
set_fs(USER_DS) are redundant.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
WANG Cong [Tue, 26 Jul 2011 00:12:43 +0000 (17:12 -0700)]
cris: fix some build warnings in pinmux.c
Fix some harmless warnings such as
arch/cris/arch-v32/mach-a3/pinmux.c:273: warning: ISO C90 forbids mixed declarations and code:
Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Mikael Starvik <starvik@axis.com>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mathias Krause [Tue, 26 Jul 2011 00:12:42 +0000 (17:12 -0700)]
m68k, exec: remove redundant set_fs(USER_DS)
The address limit is already set in flush_old_exec() so those calls to
set_fs(USER_DS) are redundant.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mathias Krause [Tue, 26 Jul 2011 00:12:40 +0000 (17:12 -0700)]
m32r, exec: remove redundant set_fs(USER_DS)
The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Hirokazu Takata <takata@linux-m32r.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mathias Krause [Tue, 26 Jul 2011 00:12:39 +0000 (17:12 -0700)]
alpha, exec: remove redundant set_fs(USER_DS)
The address limit is already set in flush_old_exec() so this
set_fs(USER_DS) is redundant.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mathias Krause [Tue, 26 Jul 2011 00:12:38 +0000 (17:12 -0700)]
h8300, exec: remove redundant set_fs(USER_DS)
The address limit is already set in flush_old_exec() so those calls to
set_fs(USER_DS) are redundant.
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Wu Fengguang [Tue, 26 Jul 2011 00:12:37 +0000 (17:12 -0700)]
writeback: account NR_WRITTEN at IO completion time
NR_WRITTEN is now accounted at block IO enqueue time, which is not very
accurate as to common understanding. This moves NR_WRITTEN accounting to
the IO completion time and makes it more consistent with BDI_WRITTEN,
which is used for bandwidth estimation.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Cc: Michael Rubin <mrubin@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 26 Jul 2011 00:12:37 +0000 (17:12 -0700)]
tmpfs: simplify unuse and writepage
shmem_unuse_inode() and shmem_writepage() contain a little code to cope
with pages inserted independently into the filecache, probably by a
filesystem stacked on top of tmpfs, then fed to its ->readpage() or
->writepage().
Unionfs was indeed experimenting with working in that way three years ago,
but I find no current examples: nowadays the stacking filesystems use vfs
interfaces to the lower filesystem.
It's now illegal: remove most of that code, adding some WARN_ON_ONCEs.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Erez Zadok <ezk@fsl.cs.sunysb.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 26 Jul 2011 00:12:36 +0000 (17:12 -0700)]
tmpfs: simplify filepage/swappage
We can now simplify shmem_getpage_gfp(): there is no longer a dilemma of
filepage passed in via shmem_readpage(), then swappage found, which must
then be copied over to it.
Although at first it's tempting to replace the **pagep arg by returning
struct page *, that makes a mess of IS_ERR_OR_NULL(page)s in all the
callers, so leave as is.
Insert BUG_ON(!PageUptodate) when we find and lock page: some of the
complication came from uninitialized pages inserted into filecache prior
to readpage; but now we're in control, and only release pagelock on
filecache once it's uptodate (if an error occurs in reading back from
swap, the page remains in swapcache, never moved to filecache).
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 26 Jul 2011 00:12:35 +0000 (17:12 -0700)]
tmpfs: simplify prealloc_page
The prealloc_page handling in shmem_getpage_gfp() is unnecessarily
complicated: first simplify that before going on to filepage/swappage.
That's right, don't report ENOMEM when the preallocation fails: we may or
may not need the page. But simply report ENOMEM once we find we do need
it, instead of dropping lock, repeating allocation, unwinding on failure
etc. And leave the out label on the fast path, don't goto.
Fix something that looks like a bug but turns out not to be: set
PageSwapBacked on prealloc_page before its mem_cgroup_cache_charge(), as
the removed case was doing. That's important before adding to LRU
(determines which LRU the page goes on), and does affect which path it
takes through memcontrol.c, but in the end MEM_CGROUP_CHANGE_TYPE_ SHMEM
is handled no differently from CACHE.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Shaohua Li <shaohua.li@intel.com>
Cc: "Zhang, Yanmin" <yanmin.zhang@intel.com>
Cc: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 26 Jul 2011 00:12:34 +0000 (17:12 -0700)]
tmpfs: remove_shmem_readpage
Remove that pernicious shmem_readpage() at last: the things we needed it
for (splice, loop, sendfile, i915 GEM) are now fully taken care of by
shmem_file_splice_read() and shmem_read_mapping_page_gfp().
This removal clears the way for a simpler shmem_getpage_gfp(), since page
is never passed in; but leave most of that cleanup until after.
sys_readahead() and sys_fadvise(POSIX_FADV_WILLNEED) will now EINVAL,
instead of unexpectedly trying to read ahead on tmpfs: if that proves to
be an issue for someone, then we can either arrange for them to return
success instead, or try to implement async readahead on tmpfs.
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 26 Jul 2011 00:12:34 +0000 (17:12 -0700)]
tmpfs: pass gfp to shmem_getpage_gfp
Make shmem_getpage() a wrapper, passing mapping_gfp_mask() down to
shmem_getpage_gfp(), which in turn passes gfp down to shmem_swp_alloc().
Change shmem_read_mapping_page_gfp() to use shmem_getpage_gfp() in the
CONFIG_SHMEM case; but leave tiny !SHMEM using read_cache_page_gfp().
Add a BUG_ON() in case anyone happens to call this on a non-shmem mapping;
though we might later want to let that case route to read_cache_page_gfp().
It annoys me to have these two almost-redundant args, gfp and fault_type:
I can't find a better way; but initialize fault_type only in shmem_fault().
Note that before, read_cache_page_gfp() was allocating i915_gem's pages
with __GFP_NORETRY as intended; but the corresponding swap vector pages
got allocated without it, leaving a small possibility of OOM.
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 26 Jul 2011 00:12:33 +0000 (17:12 -0700)]
tmpfs: refine shmem_file_splice_read
Tidy up shmem_file_splice_read():
Remove readahead: okay, we could implement shmem readahead on swap,
but have never done so before, swap being the slow exceptional path.
Use shmem_getpage() instead of find_or_create_page() plus ->readpage().
Remove several comments: sorry, I found them more distracting than
helpful, and this will not be the reference version of splice_read().
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 26 Jul 2011 00:12:32 +0000 (17:12 -0700)]
tmpfs: clone shmem_file_splice_read()
Copy __generic_file_splice_read() and generic_file_splice_read() from
fs/splice.c to shmem_file_splice_read() in mm/shmem.c. Make
page_cache_pipe_buf_ops and spd_release_page() accessible to it.
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: Jens Axboe <jaxboe@fusionio.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Benjamin Herrenschmidt [Tue, 26 Jul 2011 00:12:32 +0000 (17:12 -0700)]
mm/futex: fix futex writes on archs with SW tracking of dirty & young
I haven't reproduced it myself but the fail scenario is that on such
machines (notably ARM and some embedded powerpc), if you manage to hit
that futex path on a writable page whose dirty bit has gone from the PTE,
you'll livelock inside the kernel from what I can tell.
It will go in a loop of trying the atomic access, failing, trying gup to
"fix it up", getting succcess from gup, go back to the atomic access,
failing again because dirty wasn't fixed etc...
So I think you essentially hang in the kernel.
The scenario is probably rare'ish because affected architecture are
embedded and tend to not swap much (if at all) so we probably rarely hit
the case where dirty is missing or young is missing, but I think Shan has
a piece of SW that can reliably reproduce it using a shared writable
mapping & fork or something like that.
On archs who use SW tracking of dirty & young, a page without dirty is
effectively mapped read-only and a page without young unaccessible in the
PTE.
Additionally, some architectures might lazily flush the TLB when relaxing
write protection (by doing only a local flush), and expect a fault to
invalidate the stale entry if it's still present on another processor.
The futex code assumes that if the "in_atomic()" access -EFAULT's, it can
"fix it up" by causing get_user_pages() which would then be equivalent to
taking the fault.
However that isn't the case. get_user_pages() will not call
handle_mm_fault() in the case where the PTE seems to have the right
permissions, regardless of the dirty and young state. It will eventually
update those bits ... in the struct page, but not in the PTE.
Additionally, it will not handle the lazy TLB flushing that can be
required by some architectures in the fault case.
Basically, gup is the wrong interface for the job. The patch provides a
more appropriate one which boils down to just calling handle_mm_fault()
since what we are trying to do is simulate a real page fault.
The futex code currently attempts to write to user memory within a
pagefault disabled section, and if that fails, tries to fix it up using
get_user_pages().
This doesn't work on archs where the dirty and young bits are maintained
by software, since they will gate access permission in the TLB, and will
not be updated by gup().
In addition, there's an expectation on some archs that a spurious write
fault triggers a local TLB flush, and that is missing from the picture as
well.
I decided that adding those "features" to gup() would be too much for this
already too complex function, and instead added a new simpler
fixup_user_fault() which is essentially a wrapper around handle_mm_fault()
which the futex code can call.
[akpm@linux-foundation.org: coding-style fixes]
[akpm@linux-foundation.org: fix some nits Darren saw, fiddle comment layout]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reported-by: Shan Hai <haishan.bai@gmail.com>
Tested-by: Shan Hai <haishan.bai@gmail.com>
Cc: David Laight <David.Laight@ACULAB.COM>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Darren Hart <darren.hart@intel.com>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Konstantin Khlebnikov [Tue, 26 Jul 2011 00:12:31 +0000 (17:12 -0700)]
mm: remove useless rcu lock-unlock from mapping_tagged()
radix_tree_tagged() is lockless - it reads from a member of the raid-tree
root node. It does not require any protection.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Tue, 26 Jul 2011 00:12:30 +0000 (17:12 -0700)]
mm: page allocator: reconsider zones for allocation after direct reclaim
With zone_reclaim_mode enabled, it's possible for zones to be considered
full in the zonelist_cache so they are skipped in the future. If the
process enters direct reclaim, the ZLC may still consider zones to be full
even after reclaiming pages. Reconsider all zones for allocation if
direct reclaim returns successfully.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mel Gorman [Tue, 26 Jul 2011 00:12:29 +0000 (17:12 -0700)]
mm: page allocator: initialise ZLC for first zone eligible for zone_reclaim
There have been a small number of complaints about significant stalls
while copying large amounts of data on NUMA machines reported on a
distribution bugzilla. In these cases, zone_reclaim was enabled by
default due to large NUMA distances. In general, the complaints have not
been about the workload itself unless it was a file server (in which case
the recommendation was disable zone_reclaim).
The stalls are mostly due to significant amounts of time spent scanning
the preferred zone for pages to free. After a failure, it might fallback
to another node (as zonelists are often node-ordered rather than
zone-ordered) but stall quickly again when the next allocation attempt
occurs. In bad cases, each page allocated results in a full scan of the
preferred zone.
Patch 1 checks the preferred zone for recent allocation failure
which is particularly important if zone_reclaim has failed
recently. This avoids rescanning the zone in the near future and
instead falling back to another node. This may hurt node locality
in some cases but a failure to zone_reclaim is more expensive than
a remote access.
Patch 2 clears the zlc information after direct reclaim.
Otherwise, zone_reclaim can mark zones full, direct reclaim can
reclaim enough pages but the zone is still not considered for
allocation.
This was tested on a 24-thread 2-node x86_64 machine. The tests were
focused on large amounts of IO. All tests were bound to the CPUs on
node-0 to avoid disturbances due to processes being scheduled on different
nodes. The kernels tested are
3.0-rc6-vanilla Vanilla 3.0-rc6
zlcfirst Patch 1 applied
zlcreconsider Patches 1+2 applied
FS-Mark
./fs_mark -d /tmp/fsmark-10813 -D 100 -N 5000 -n 208 -L 35 -t 24 -S0 -s 524288
fsmark-3.0-rc6 3.0-rc6 3.0-rc6
vanilla zlcfirs zlcreconsider
Files/s min 54.90 ( 0.00%) 49.80 (-10.24%) 49.10 (-11.81%)
Files/s mean 100.11 ( 0.00%) 135.17 (25.94%) 146.93 (31.87%)
Files/s stddev 57.51 ( 0.00%) 138.97 (58.62%) 158.69 (63.76%)
Files/s max 361.10 ( 0.00%) 834.40 (56.72%) 802.40 (55.00%)
Overhead min 76704.00 ( 0.00%) 76501.00 ( 0.27%) 77784.00 (-1.39%)
Overhead mean
1485356.51 ( 0.00%)
1035797.83 (43.40%)
1594680.26 (-6.86%)
Overhead stddev
1848122.53 ( 0.00%) 881489.88 (109.66%)
1772354.90 ( 4.27%)
Overhead max
7989060.00 ( 0.00%)
3369118.00 (137.13%)
10135324.00 (-21.18%)
MMTests Statistics: duration
User/Sys Time Running Test (seconds) 501.49 493.91 499.93
Total Elapsed Time (seconds) 2451.57 2257.48 2215.92
MMTests Statistics: vmstat
Page Ins 46268 63840 66008
Page Outs
90821596 90671128 88043732
Swap Ins 0 0 0
Swap Outs 0 0 0
Direct pages scanned
13091697 8966863 8971790
Kswapd pages scanned 0
1830011 1831116
Kswapd pages reclaimed 0
1829068 1829930
Direct pages reclaimed
13037777 8956828 8648314
Kswapd efficiency 100% 99% 99%
Kswapd velocity 0.000 810.643 826.346
Direct efficiency 99% 99% 96%
Direct velocity 5340.128 3972.068 4048.788
Percentage direct scans 100% 83% 83%
Page writes by reclaim 0 3 0
Slabs scanned 796672 720640 720256
Direct inode steals
7422667 7160012 7088638
Kswapd inode steals 0
1736840 2021238
Test completes far faster with a large increase in the number of files
created per second. Standard deviation is high as a small number of
iterations were much higher than the mean. The number of pages scanned by
zone_reclaim is reduced and kswapd is used for more work.
LARGE DD
3.0-rc6 3.0-rc6 3.0-rc6
vanilla zlcfirst zlcreconsider
download tar 59 ( 0.00%) 59 ( 0.00%) 55 ( 7.27%)
dd source files 527 ( 0.00%) 296 (78.04%) 320 (64.69%)
delete source 36 ( 0.00%) 19 (89.47%) 20 (80.00%)
MMTests Statistics: duration
User/Sys Time Running Test (seconds) 125.03 118.98 122.01
Total Elapsed Time (seconds) 624.56 375.02 398.06
MMTests Statistics: vmstat
Page Ins
3594216 439368 407032
Page Outs
23380832 23380488 23377444
Swap Ins 0 0 0
Swap Outs 0 436 287
Direct pages scanned
17482342 69315973 82864918
Kswapd pages scanned 0 519123 575425
Kswapd pages reclaimed 0 466501 522487
Direct pages reclaimed
5858054 2732949 2712547
Kswapd efficiency 100% 89% 90%
Kswapd velocity 0.000 1384.254 1445.574
Direct efficiency 33% 3% 3%
Direct velocity 27991.453 184832.737 208171.929
Percentage direct scans 100% 99% 99%
Page writes by reclaim 0 5082 13917
Slabs scanned 17280 29952 35328
Direct inode steals 115257
1431122 332201
Kswapd inode steals 0 0 979532
This test downloads a large tarfile and copies it with dd a number of
times - similar to the most recent bug report I've dealt with. Time to
completion is reduced. The number of pages scanned directly is still
disturbingly high with a low efficiency but this is likely due to the
number of dirty pages encountered. The figures could probably be improved
with more work around how kswapd is used and how dirty pages are handled
but that is separate work and this result is significant on its own.
Streaming Mapped Writer
MMTests Statistics: duration
User/Sys Time Running Test (seconds) 124.47 111.67 112.64
Total Elapsed Time (seconds) 2138.14 1816.30 1867.56
MMTests Statistics: vmstat
Page Ins 90760 89124 89516
Page Outs
121028340 120199524 120736696
Swap Ins 0 86 55
Swap Outs 0 0 0
Direct pages scanned
114989363 96461439 96330619
Kswapd pages scanned
56430948 56965763 57075875
Kswapd pages reclaimed
27743219 27752044 27766606
Direct pages reclaimed 49777 46884 36655
Kswapd efficiency 49% 48% 48%
Kswapd velocity 26392.541 31363.631 30561.736
Direct efficiency 0% 0% 0%
Direct velocity 53780.091 53108.759 51581.004
Percentage direct scans 67% 62% 62%
Page writes by reclaim 385 122 1513
Slabs scanned 43008 39040 42112
Direct inode steals 0 10 8
Kswapd inode steals 733 534 477
This test just creates a large file mapping and writes to it linearly.
Time to completion is again reduced.
The gains are mostly down to two things. In many cases, there is less
scanning as zone_reclaim simply gives up faster due to recent failures.
The second reason is that memory is used more efficiently. Instead of
scanning the preferred zone every time, the allocator falls back to
another zone and uses it instead improving overall memory utilisation.
This patch: initialise ZLC for first zone eligible for zone_reclaim.
The zonelist cache (ZLC) is used among other things to record if
zone_reclaim() failed for a particular zone recently. The intention is to
avoid a high cost scanning extremely long zonelists or scanning within the
zone uselessly.
Currently the zonelist cache is setup only after the first zone has been
considered and zone_reclaim() has been called. The objective was to avoid
a costly setup but zone_reclaim is itself quite expensive. If it is
failing regularly such as the first eligible zone having mostly mapped
pages, the cost in scanning and allocation stalls is far higher than the
ZLC initialisation step.
This patch initialises ZLC before the first eligible zone calls
zone_reclaim(). Once initialised, it is checked whether the zone failed
zone_reclaim recently. If it has, the zone is skipped. As the first zone
is now being checked, additional care has to be taken about zones marked
full. A zone can be marked "full" because it should not have enough
unmapped pages for zone_reclaim but this is excessive as direct reclaim or
kswapd may succeed where zone_reclaim fails. Only mark zones "full" after
zone_reclaim fails if it failed to reclaim enough pages after scanning.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: Minchan Kim <minchan.kim@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
KAMEZAWA Hiroyuki [Tue, 26 Jul 2011 00:12:27 +0000 (17:12 -0700)]
mm: preallocate page before lock_page() at filemap COW
Currently we are keeping faulted page locked throughout whole __do_fault
call (except for page_mkwrite code path) after calling file system's fault
code. If we do early COW, we allocate a new page which has to be charged
for a memcg (mem_cgroup_newpage_charge).
This function, however, might block for unbounded amount of time if memcg
oom killer is disabled or fork-bomb is running because the only way out of
the OOM situation is either an external event or OOM-situation fix.
In the end we are keeping the faulted page locked and blocking other
processes from faulting it in which is not good at all because we are
basically punishing potentially an unrelated process for OOM condition in
a different group (I have seen stuck system because of ld-2.11.1.so being
locked).
We can do test easily.
% cgcreate -g memory:A
% cgset -r memory.limit_in_bytes=64M A
% cgset -r memory.memsw.limit_in_bytes=64M A
% cd kernel_dir; cgexec -g memory:A make -j
Then, the whole system will live-locked until you kill 'make -j'
by hands (or push reboot...) This is because some important page in a
a shared library are locked.
Considering again, the new page is not necessary to be allocated
with lock_page() held. And usual page allocation may dive into
long memory reclaim loop with holding lock_page() and can cause
very long latency.
There are 3 ways.
1. do allocation/charge before lock_page()
Pros. - simple and can handle page allocation in the same manner.
This will reduce holding time of lock_page() in general.
Cons. - we do page allocation even if ->fault() returns error.
2. do charge after unlock_page(). Even if charge fails, it's just OOM.
Pros. - no impact to non-memcg path.
Cons. - implemenation requires special cares of LRU and we need to modify
page_add_new_anon_rmap()...
3. do unlock->charge->lock again method.
Pros. - no impact to non-memcg path.
Cons. - This may kill LOCK_PAGE_RETRY optimization. We need to release
lock and get it again...
This patch moves "charge" and memory allocation for COW page
before lock_page(). Then, we can avoid scanning LRU with holding
a lock on a page and latency under lock_page() will be reduced.
Then, above livelock disappears.
[akpm@linux-foundation.org: fix code layout]
Signed-off-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Reported-by: Lutz Vieweg <lvml@5t9.de>
Original-idea-by: Michal Hocko <mhocko@suse.cz>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Ying Han <yinghan@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 26 Jul 2011 00:12:26 +0000 (17:12 -0700)]
tmpfs: no need to use i_lock
2.6.36's
7e496299d4d2 ("tmpfs: make tmpfs scalable with percpu_counter for
used blocks") to make tmpfs scalable with percpu_counter used
inode->i_lock in place of sbinfo->stat_lock around i_blocks updates; but
that was adverse to scalability, and unnecessary, since info->lock is
already held there in the fast paths.
Remove those uses of i_lock, and add info->lock in the three error paths
where it's then needed across shmem_free_blocks(). It's not actually
needed across shmem_unacct_blocks(), but they're so often paired that it
looks wrong to split them apart.
Signed-off-by: Hugh Dickins <hughd@google.com>
Acked-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 26 Jul 2011 00:12:25 +0000 (17:12 -0700)]
mm: pincer in truncate_inode_pages_range
truncate_inode_pages_range()'s final loop has a nice pincer property,
bringing start and end together, squeezing out the last pages. But the
range handling missed out on that, just sliding up the range, perhaps
letting pages come in behind it. Add one more test to give it the same
pincer effect.
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Hugh Dickins [Tue, 26 Jul 2011 00:12:25 +0000 (17:12 -0700)]
mm: consistent truncate and invalidate loops
Make the pagevec_lookup loops in truncate_inode_pages_range(),
invalidate_mapping_pages() and invalidate_inode_pages2_range() more
consistent with each other.
They were relying upon page->index of an unlocked page, but apologizing
for it: accept it, embrace it, add comments and WARN_ONs, and simplify the
index handling.
invalidate_inode_pages2_range() had special handling for a wrapped
page->index + 1 = 0 case; but MAX_LFS_FILESIZE doesn't let us anywhere
near there, and a corrupt page->index in the radix_tree could cause more
trouble than that would catch. Remove that wrapped handling.
invalidate_inode_pages2_range() uses min() to limit the pagevec_lookup
when near the end of the range: copy that into the other two, although
it's less useful than you might think (it limits the use of the buffer,
rather than the indices looked up).
Signed-off-by: Hugh Dickins <hughd@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>