git.openwrt.org Git - openwrt/staging/blogic.git/log

PM / Sleep: Simplify generic system suspend callbacks

The pm_runtime_suspended() check in __pm_generic_call() doesn't
really help and may cause problems to happen, because in some cases
the system suspend callbacks need to be called even if the given
device has been suspended by runtime PM. For example, if the device
generally supports remote wakeup and is not enabled to wake up
the system from sleep, it should be prevented from generating wakeup
signals during system suspend and that has to be done by the
suspend callbacks that the pm_runtime_suspended() check prevents from
being executed.

Similarly, it may not be a good idea to unconditionally change
the runtime PM status of the device to 'active' in
__pm_generic_resume(), because the driver may want to leave the
device in the 'suspended' state, depending on what happened to it
before the system suspend and whether or not it is enabled to
wake up the system.

For the above reasons, remove the pm_runtime_suspended()
check from __pm_generic_call() and remove the code changing the
device's runtime PM status from __pm_generic_resume().

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

PM / Hibernate: Remove deprecated hibernation snapshot ioctls

Several snapshot ioctls were marked for removal quite some time ago,
since they were deprecated. Remove them.

Suggested-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

PM / Sleep: Fix freezer failures due to racy usermodehelper_is_disabled()

Commit a144c6a (PM: Print a warning if firmware is requested when tasks
are frozen) introduced usermodehelper_is_disabled() to warn and exit
immediately if firmware is requested when usermodehelpers are disabled.

However, it is racy. Consider the following scenario, currently used in
drivers/base/firmware_class.c:

...
if (usermodehelper_is_disabled())
goto out;

/* Do actual work */
...

out:
return err;

Nothing prevents someone from disabling usermodehelpers just after the check
in the 'if' condition, which means that it is quite possible to try doing the
"actual work" with usermodehelpers disabled, leading to undesirable
consequences.

In particular, this race condition in _request_firmware() causes task freezing
failures whenever suspend/hibernation is in progress because, it wrongly waits
to get the firmware/microcode image from userspace when actually the
usermodehelpers are disabled or userspace has been frozen.
Some of the example scenarios that cause freezing failures due to this race
are those that depend on userspace via request_firmware(), such as x86
microcode module initialization and microcode image reload.

Previous discussions about this issue can be found at:
http://thread.gmane.org/gmane.linux.kernel/1198291/focus=1200591

This patch adds proper synchronization to fix this issue.

It is to be noted that this patchset fixes the freezing failures but doesn't
remove the warnings. IOW, it does not attempt to add explicit synchronization
to x86 microcode driver to avoid requesting microcode image at inopportune
moments. Because, the warnings were introduced to highlight such cases, in the
first place. And we need not silence the warnings, since we take care of the
*real* problem (freezing failure) and hence, after that, the warnings are
pretty harmless anyway.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

PM / Sleep: Recommend [un]lock_system_sleep() over using pm_mutex directly

Update the documentation to explain the perils of directly using
mutex_[un]lock(&pm_mutex) and recommend the usage of the safe
APIs [un]lock_system_sleep() instead.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

PM / Sleep: Replace mutex_[un]lock(&pm_mutex) with [un]lock_system_sleep()

Using [un]lock_system_sleep() is safer than directly using mutex_[un]lock()
on 'pm_mutex', since the latter could lead to freezing failures. Hence convert
all the present users of mutex_[un]lock(&pm_mutex) to use these safe APIs
instead.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

PM / Sleep: Make [un]lock_system_sleep() generic

The [un]lock_system_sleep() APIs were originally introduced to mutually
exclude memory hotplug and hibernation.

Directly using mutex_lock(&pm_mutex) to achieve mutual exclusion with
suspend or hibernation code can lead to freezing failures. However, the
APIs [un]lock_system_sleep() can be safely used to achieve the same,
without causing freezing failures.

So, since it would be beneficial to modify all the existing users of
mutex_lock(&pm_mutex) (in all parts of the kernel), so that they use these
safe APIs intead, make these APIs generic by removing the restriction that
they work only when CONFIG_HIBERNATE_CALLBACKS is set. Moreover, that
restriction didn't buy us anything anyway.

Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

PM / Sleep: Use the freezer_count() functions in [un]lock_system_sleep() APIs

Now that freezer_count() and freezer_do_not_count() don't have the restriction
that they are effective only when called by userspace processes, use
them in lock_system_sleep() and unlock_system_sleep() instead of open-coding
their parts.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

Merge branch 'pm-freezer' into pm-sleep

* pm-freezer:
PM / Freezer: Remove the "userspace only" constraint from freezer[_do_not]_count()

PM / Freezer: Remove the "userspace only" constraint from freezer[_do_not]_count()

At present, the functions freezer_count() and freezer_do_not_count()
impose the restriction that they are effective only for userspace processes.
However, now, these functions have found more utility than originally
intended by the commit which introduced it: ba96a0c8 (freezer:
fix vfork problem). And moreover, even the vfork issue actually does not
need the above restriction in these functions.

So, modify these functions to make them work even for kernel threads, so
that they can be used at other places in the kernel, where the userspace
restriction doesn't apply.

Suggested-by: Oleg Nesterov <oleg@redhat.com>
Suggested-by: Tejun Heo <tj@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

PM / Hibernate: Replace unintuitive 'if' condition in kernel/power/user.c with 'else'

In the snapshot_ioctl() function, under SNAPSHOT_FREEZE, the code below
freeze_processes() is a bit unintuitive. Improve it by replacing the
second 'if' condition with an 'else' clause.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

Merge branch 'pm-freezer' into pm-sleep

* pm-freezer: (26 commits)
  Freezer / sunrpc / NFS: don't allow TASK_KILLABLE sleeps to block the freezer
  Freezer: fix more fallout from the thaw_process rename
  freezer: fix wait_event_freezable/__thaw_task races
  freezer: kill unused set_freezable_with_signal()
  dmatest: don't use set_freezable_with_signal()
  usb_storage: don't use set_freezable_with_signal()
  freezer: remove unused @sig_only from freeze_task()
  freezer: use lock_task_sighand() in fake_signal_wake_up()
  freezer: restructure __refrigerator()
  freezer: fix set_freezable[_with_signal]() race
  freezer: remove should_send_signal() and update frozen()
  freezer: remove now unused TIF_FREEZE
  freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE
  cgroup_freezer: prepare for removal of TIF_FREEZE
  freezer: clean up freeze_processes() failure path
  freezer: kill PF_FREEZING
  freezer: test freezable conditions while holding freezer_lock
  freezer: make freezing indicate freeze condition in effect
  freezer: use dedicated lock instead of task_lock() + memory barrier
  freezer: don't distinguish nosig tasks on thaw
  ...

Freezer / sunrpc / NFS: don't allow TASK_KILLABLE sleeps to block the freezer

Allow the freezer to skip wait_on_bit_killable sleeps in the sunrpc
layer. This should allow suspend and hibernate events to proceed, even
when there are RPC's pending on the wire.

Also, wrap the TASK_KILLABLE sleeps in NFS layer in freezer_do_not_count
and freezer_count calls. This allows the freezer to skip tasks that are
sleeping while looping on EJUKEBOX or NFS4ERR_DELAY sorts of errors.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

PM / Sleep: Unify diagnostic messages from device suspend/resume

Make pm_op() and pm_noirq_op() use the same helper function for
running callbacks, which will cause them to use the same format of
diagnostic messages. This also reduces the complexity and size of
the code quite a bit.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Greg Kroah-Hartman <gregkh@suse.de>

ACPI / PM: Do not save/restore NVS on Asus K54C/K54HR

The models do not resume correctly without acpi_sleep=nonvs.

Signed-off-by: Keng-Yu Lin <kengyu@canonical.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

PM / Hibernate: Remove deprecated hibernation test modes

The hibernation test modes 'test' and 'testproc' are deprecated, because
the 'pm_test' framework offers much more fine-grained control for debugging
suspend and hibernation related problems.

So, remove the deprecated 'test' and 'testproc' hibernation test modes.

Suggested-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

PM / Hibernate: Thaw processes in SNAPSHOT_CREATE_IMAGE ioctl test path

Commit 2aede851ddf08666f68ffc17be446420e9d2a056 (PM / Hibernate: Freeze
kernel threads after preallocating memory) moved the freezing of kernel
threads to hibernation_snapshot() function.

So now, if the call to hibernation_snapshot() returns early due to a
successful hibernation test, the caller has to thaw processes to ensure
that the system gets back to its original state.

But in SNAPSHOT_CREATE_IMAGE hibernation ioctl, the caller does not thaw
processes in case hibernation_snapshot() returned due to a successful
freezer test. Fix this issue. But note we still send the value of 'in_suspend'
(which is now 0) to userspace, because we are not in an error path per-se,
and moreover, the value of in_suspend correctly depicts the situation here.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

PM / Hibernate: Enable usermodehelpers in software_resume() error path

In the software_resume() function defined in kernel/power/hibernate.c,
if the call to create_basic_memory_bitmaps() fails, the usermodehelpers
are not enabled (which had been disabled in the previous step). Fix it.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

Freezer: fix more fallout from the thaw_process rename

Commit 944e192db53c "freezer: rename thaw_process() to __thaw_task()
and simplify the implementation" did not create a !CONFIG_FREEZER version
of __thaw_task().

Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

PM / Hibernate: Refactor and simplify hibernation_snapshot() code

The goto statements in hibernation_snapshot() are a bit complex.
Refactor the code to remove some of them, thereby simplifying the
implementation.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

PM / Sleep: Simplify device_suspend_noirq()

Remove a few if () and return statements in device_suspend_noirq()
that aren't really necessary.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

PM / Sleep: Remove unnecessary label and jumps to it form PM core code

The "End" label in device_prepare() in drivers/base/power/main.c is
not necessary and the jumps to it have no real effect, so remove them
all.

Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

PM / Memory-hotplug: Avoid task freezing failures

The lock_system_sleep() function is used in the memory hotplug code at
several places in order to implement mutual exclusion with hibernation.
However, this function tries to acquire the 'pm_mutex' lock using
mutex_lock() and hence blocks in TASK_UNINTERRUPTIBLE state if it doesn't
get the lock. This would lead to task freezing failures and hence
hibernation failure as a consequence, even though the hibernation call path
successfully acquired the lock.

But it is to be noted that, since this task tries to acquire pm_mutex, if it
blocks due to this, we are *100% sure* that this task is not going to run
as long as hibernation sequence is in progress, since hibernation releases
'pm_mutex' only at the very end, when everything is done.
And this means, this task is going to be anyway blocked for much more longer
than what the freezer intends to achieve; which means, freezing and thawing
doesn't really make any difference to this task!

So, to fix freezing failures, we just ask the freezer to skip freezing this
task, since it is already "frozen enough".

But instead of calling freezer_do_not_count() and freezer_count() as it is,
we use only the relevant parts of those functions, because restrictions
such as 'the task should be a userspace one' etc., might not be relevant in
this scenario.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

PM: Fix indentation and remove extraneous whitespaces in kernel/power/main.c

Lack of proper indentation of the goto statement decreases the readability
of code significantly. In fact, this made me look twice at the code to check
whether it really does what it should be doing. Fix this.

And in the same file, there are some extra whitespaces. Get rid of them too.

Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>

Merge branch 'pm-freezer' of git://git./linux/kernel/git/tj/misc into pm-freezer

* 'pm-freezer' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/misc: (24 commits)
  freezer: fix wait_event_freezable/__thaw_task races
  freezer: kill unused set_freezable_with_signal()
  dmatest: don't use set_freezable_with_signal()
  usb_storage: don't use set_freezable_with_signal()
  freezer: remove unused @sig_only from freeze_task()
  freezer: use lock_task_sighand() in fake_signal_wake_up()
  freezer: restructure __refrigerator()
  freezer: fix set_freezable[_with_signal]() race
  freezer: remove should_send_signal() and update frozen()
  freezer: remove now unused TIF_FREEZE
  freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE
  cgroup_freezer: prepare for removal of TIF_FREEZE
  freezer: clean up freeze_processes() failure path
  freezer: kill PF_FREEZING
  freezer: test freezable conditions while holding freezer_lock
  freezer: make freezing indicate freeze condition in effect
  freezer: use dedicated lock instead of task_lock() + memory barrier
  freezer: don't distinguish nosig tasks on thaw
  freezer: remove racy clear_freeze_flag() and set PF_NOFREEZE on dead tasks
  freezer: rename thaw_process() to __thaw_task() and simplify the implementation
  ...

PM / Hibernate: Do not leak memory in error/test code paths

The hibernation core code forgets to release memory preallocated
for hibernation if there's an error in its early stages or if test
modes causing hibernation_snapshot() to return early are used. This
causes the system to be hardly usable, because the amount of
preallocated memory is usually huge. Fix this problem.

Reported-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

Merge branch 'for-linus' of git://git./linux/kernel/git/rostedt/linux-ktest

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
ktest: Check parent options for iterated tests

Merge branch 'i2c-for-linus' of git://git./linux/kernel/git/jdelvare/staging

* 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
  i2c: Make i2cdev_notifier_call static
  i2c: Delete ANY_I2C_BUS
  i2c: Fix device name for 10-bit slave address
  i2c-algo-bit: Generate correct i2c address sequence for 10-bit target

Merge branch 'for-linus' of git://git./linux/kernel/git/broonie/regulator

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: TPS65910: Fix VDD1/2 voltage selector count

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (31 commits)
  drm: integer overflow in drm_mode_dirtyfb_ioctl()
  drivers/gpu/vga/vgaarb.c: add missing kfree
  drm/radeon/kms/atom: unify i2c gpio table handling
  drm/radeon/kms: fix up gpio i2c mask bits for r4xx for real
  ttm: Don't return the bo reserved on error path
  drm/radeon/kms: add a CS ioctl flag not to rewrite tiling flags in the CS
  drm/i915: Fix inconsistent backlight level during disabled
  drm, i915: Fix memory leak in i915_gem_busy_ioctl().
  drm/i915: Use DPCD value for max DP lanes.
  drm/i915: Initiate DP link training only on the lanes we'll be using
  drm/i915: Remove trailing white space
  drm/i915: Try harder during dp pattern 1 link training
  drm/i915: Make DP prepare/commit consistent with DP dpms
  drm/i915: Let panel power sequencing hardware do its job
  drm/i915: Treat PCH eDP like DP in most places
  drm/i915: Remove link_status field from intel_dp structure
  drm/i915: Move common PCH_PP_CONTROL setup to ironlake_get_pp_control
  drm/i915: Module parameters using '-1' as default must be signed type
  drm/i915: Turn on another required clock gating bit on gen6.
  drm/i915: Turn on a required 3D clock gating bit on Sandybridge.
  ...

freezer: fix wait_event_freezable/__thaw_task races

wait_event_freezable() and friends stop the waiting if try_to_freeze()
fails. This is not right, we can race with __thaw_task() and in this
case

- wait_event_freezable() returns the wrong ERESTARTSYS

- wait_event_freezable_timeout() can return the positive
value while condition == F

Change the code to always check __retval/condition before return.

Note: with or without this patch the timeout logic looks strange,
probably we should recalc timeout if try_to_freeze() returns T.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>

freezer: kill unused set_freezable_with_signal()

There's no in-kernel user of set_freezable_with_signal() left.  Mixing
TIF_SIGPENDING with kernel threads can lead to nasty corner cases as
kernel threads never travel signal delivery path on their own.

e.g. the current implementation is buggy in the cancelation path of
__thaw_task().  It calls recalc_sigpending_and_wake() in an attempt to
clear TIF_SIGPENDING but the function never clears it regardless of
sigpending state.  This means that signallable freezable kthreads may
continue executing with !freezing() && stuck TIF_SIGPENDING, which can
be troublesome.

This patch removes set_freezable_with_signal() along with
PF_FREEZER_NOSIG and recalc_sigpending*() calls in freezer.  User
tasks get TIF_SIGPENDING, kernel tasks get woken up and the spurious
sigpending is dealt with in the usual signal delivery path.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>

dmatest: don't use set_freezable_with_signal()

Commit 981ed70d8e (dmatest: make dmatest threads freezable) made
dmatest kthread use set_freezable_with_signal(); however, the
interface is scheduled to be removed in the next merge window.

The problem is that unlike userland tasks there's no default place
which handles signal pending state and it isn't clear who owns and/or
is responsible for clearing TIF_SIGPENDING.  For example, in the
current code, try_to_freeze() clears TIF_SIGPENDING but it isn't sure
whether it actually owns the TIF_SIGPENDING nor is it race-free -
ie. the task may continue to run with TIF_SIGPENDING set after the
freezable section.

Unfortunately, we don't have wait_for_completion_freezable_timeout().
This patch open codes it and uses wait_event_freezable_timeout()
instead and removes timeout reloading - wait_event_freezable_timeout()
won't return across freezing events (currently racy but fix scheduled)
and timer doesn't decrement while the task is in freezer.  Although
this does lose timer-reset-over-freezing, given that timeout is
supposed to be long enough and failure to finish inside is considered
irrecoverable, I don't think this is worth the complexity.

While at it, move completion to outer scope and explain that we're
ignoring dangling pointer problem after timeout.  This should give
slightly better chance at avoiding oops after timeout.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Cc: Guennadi Liakhovetski <g.liakhovetski@gmx.de>
Cc: Nicolas Ferre <nicolas.ferre@atmel.com>

regulator: TPS65910: Fix VDD1/2 voltage selector count

Count of selector voltage is required for regulator_set_voltage
to work via set_voltage_sel. VDD1/2 currently have it as zero,
so regulator_set_voltage won't work for VDD1/2.
Update count (n_voltages) for VDD1/2.

Output Voltage = (step value * 12.5 mV + 562.5 mV) * gain

With above expr, number of voltages that can be selected is
step value count * gain count

constant for gain count will be called VDD1_2_NUM_VOLT_COARSE

existing constant for step value count is VDD1_2_NUM_VOLTS,
use VDD1_2_NUM_VOLT_FINE instead to make clear that step value
is not the only component in deciding selectable voltage count

Signed-off-by: Afzal Mohammed <afzal@ti.com>
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>

i2c: Make i2cdev_notifier_call static

The function i2cdev_notifier_call is used only in i2c-dev file
making it static.
Also removes the following sparse warning

drivers/i2c/i2c-dev.c:582:5: warning: symbol 'i2cdev_notifier_call'
was not declared. Should it be static?

Signed-off-by: Shubhrajyoti D <shubhrajyoti@ti.com>
Signed-off-by: Jean Delvare <khali@linux-fr.org>

i2c: Delete ANY_I2C_BUS

Last piece of code using ANY_I2C_BUS was deleted almost 2 years ago,
so ANY_I2C_BUS can go away as well.

Signed-off-by: Jean Delvare <khali@linux-fr.org>

i2c: Fix device name for 10-bit slave address

10-bit addresses overlap with traditional 7-bit addresses, leading in
device name collisions. Add an arbitrary offset to 10-bit addresses to
prevent this collision. The offset was chosen so that the address is
still easily recognizable.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Acked-by: Wolfram Sang <w.sang@pengutronix.de>

i2c-algo-bit: Generate correct i2c address sequence for 10-bit target

The wrong bits were put on the wire, fix that.

This fixes kernel bug #42562.

Signed-off-by: Sheng-Hui J. Chu <jeffchu@broadcom.com>
Cc: stable@kernel.org
Signed-off-by: Jean Delvare <khali@linux-fr.org>

drm: integer overflow in drm_mode_dirtyfb_ioctl()

There is a potential integer overflow in drm_mode_dirtyfb_ioctl()
if userspace passes in a large num_clips. The call to kmalloc would
allocate a small buffer, and the call to fb->funcs->dirty may result
in a memory corruption.

Reported-by: Haogang Chen <haogangchen@gmail.com>
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Dave Airlie <airlied@redhat.com>

Revert "of/irq: of_irq_find_parent: check for parent equal to child"

This reverts commit dc9372808412edbc653a675a526c2ee6c0c14a91.

As requested by Ben Herrenschmidt:
  "This breaks some powerpc platforms at least.  The practice of having
   a node provide an explicit "interrupt-parent" property pointing to
   itself is an old trick that we've used in the past to allow a
   device-node to have interrupts routed to different controllers.

   In that case, the node also contains an interrupt-map, so the node is
   its own parent, the interrupt resolution hits the map, which then can
   route each individual interrupt to a different parent."

Grant says:
  "Ah, nuts, yes that is broken then.  Yes, please revert the commit and
   Rob & I will come up with a better solution.

   Rob, I think it can be done by explicitly checking for np ==
   desc->interrupt_parent in of_irq_init() instead of relying on
   of_irq_find_parent() returning NULL."

Requested-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Grant Likely <grant.likely@secretlab.ca>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: devicetree-discuss@lists.ozlabs.org
Cc: linuxppc-dev <linuxppc-dev@lists.ozlabs.org>
Cc: Tanmay Inamdar <tinamdar@apm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  mount_subtree() pointless use-after-free
  iio: fix a leak due to improper use of anon_inode_getfd()
  microblaze: bury asm/namei.h

drivers/gpu/vga/vgaarb.c: add missing kfree

kbuf is a buffer that is local to this function, so all of the error paths
leaving the function should release it.

Signed-off-by: Julia Lawall <julia@diku.dk>
Cc: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/radeon/kms/atom: unify i2c gpio table handling

Split the quirks and i2c_rec assignment into separate
functions used by both radeon_lookup_i2c_gpio() and
radeon_atombios_i2c_init(). This avoids duplicating code
and cases where quirks were only added to one of the
functions.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>

drm/radeon/kms: fix up gpio i2c mask bits for r4xx for real

Fixes i2c test failures when i2c_algo_bit.bit_test=1.

The hw doesn't actually require a mask, so just set it
to the default mask bits for r1xx-r4xx radeon ddc.

I missed this part the first time through.

Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@kernel.org
Cc: Jean Delvare <khali@linux-fr.org>
Signed-off-by: Dave Airlie <airlied@redhat.com>

ttm: Don't return the bo reserved on error path

An unlikely race could case a bo to be returned reserved on an error path.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Reviewed-by: Jerome Glisse <jglisse@redhat.com>
Signed-off-by: Dave Airlie <airlied@redhat.com>

Merge branch 'drm-intel-fixes' of git://people.freedesktop.org/~keithp/linux into drm-fixes

* 'drm-intel-fixes' of git://people.freedesktop.org/~keithp/linux: (25 commits)
  drm/i915: Fix inconsistent backlight level during disabled
  drm, i915: Fix memory leak in i915_gem_busy_ioctl().
  drm/i915: Use DPCD value for max DP lanes.
  drm/i915: Initiate DP link training only on the lanes we'll be using
  drm/i915: Remove trailing white space
  drm/i915: Try harder during dp pattern 1 link training
  drm/i915: Make DP prepare/commit consistent with DP dpms
  drm/i915: Let panel power sequencing hardware do its job
  drm/i915: Treat PCH eDP like DP in most places
  drm/i915: Remove link_status field from intel_dp structure
  drm/i915: Move common PCH_PP_CONTROL setup to ironlake_get_pp_control
  drm/i915: Module parameters using '-1' as default must be signed type
  drm/i915: Turn on another required clock gating bit on gen6.
  drm/i915: Turn on a required 3D clock gating bit on Sandybridge.
  drm/i915: enable cacheable objects on Ivybridge
  drm/i915: add constants to size fence arrays and fields
  drm/i915: Ivybridge still has fences!
  drm/i915: forcewake warning fixes in debugfs
  drm/i915: Fix object refcount leak on mmappable size limit error path.
  drm/i915: Use mode_config.mutex in ironlake_panel_vdd_work
  ...

mount_subtree() pointless use-after-free

d'oh... we'd carefully pinned mnt->mnt_sb down, dropped mnt and attempt
to grab s_umount on mnt->mnt_sb. The trouble is, *mnt might've been
overwritten by now...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Merge branch 'for-linus' of git://git./linux/kernel/git/dtor/input

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
  Input: ams_delta_serio - include linux/module.h
  Input: elantech - adjust hw_version detection logic
  Input: i8042 - add HP Pavilion dv4s to 'notimeout' and 'nomux' blacklists

Merge git://www.linux-watchdog.org/linux-watchdog

* git://www.linux-watchdog.org/linux-watchdog:
  watchdog: fix initialisation printout in s3c2410_wdt
  watchdog: Don't overwrite error value in wm831x_wdt_set_timeout()
  watchdog: adx_wdt.c: remove driver

Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  NFS: Revert pnfs ugliness from the generic NFS read code path
  SUNRPC: destroy freshly allocated transport in case of sockaddr init error
  NFS: Fix a regression in the referral code
  nfs: move nfs_file_operations declaration to bottom of file.c (try #2)
  nfs: when attempting to open a directory, fall back on normal lookup (try #5)

Merge branch 'for-linus' of git://git./linux/kernel/git/mason/linux-btrfs

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs:
  Btrfs: remove free-space-cache.c WARN during log replay
  Btrfs: sectorsize align offsets in fiemap
  Btrfs: clear pages dirty for io and set them extent mapped
  Btrfs: wait on caching if we're loading the free space cache
  Btrfs: prefix resize related printks with btrfs:
  btrfs: fix stat blocks accounting
  Btrfs: avoid unnecessary bitmap search for cluster setup
  Btrfs: fix to search one more bitmap for cluster setup
  btrfs: mirror_num should be int, not u64
  btrfs: Fix up 32/64-bit compatibility for new ioctls
  Btrfs: fix barrier flushes
  Btrfs: fix tree corruption after multi-thread snapshots and inode_cache flush

Merge branch 'writeback-for-linus' of git://git./linux/kernel/git/wfg/linux

* 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux:
  writeback: remove vm_dirties and task->dirties
  writeback: hard throttle 1000+ dd on a slow USB stick
  mm: Make task in balance_dirty_pages() killable

Merge branch 'staging-linus' of git://git./linux/kernel/git/gregkh/staging

* 'staging-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging:
  staging: fix more ET131X build errors
  staging: et131x depends on NET
  staging: slicoss depends on NET
  linux-next: et131x: Fix build error when CONFIG_PM_SLEEP not enabled

Merge branch 'usb-linus' of git://git./linux/kernel/git/gregkh/usb

* 'usb-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (48 commits)
  USB: Fix Corruption issue in USB ftdi driver ftdi_sio.c
  USB: option: add PID of Huawei E173s 3G modem
  OHCI: final fix for NVIDIA problems (I hope)
  USB: option: release new PID for ZTE 3G modem
  usb: Netlogic: Fix HC_LENGTH call in ehci-xls.c
  USB: storage: ene_ub6250: fix compile warnings
  USB: option: add id for 3G dongle Model VT1000 of Viettel
  USB: serial: pl2303: rm duplicate id
  USB: pch_udc: Change company name OKI SEMICONDUCTOR to LAPIS Semiconductor
  USB: pch_udc: Support new device LAPIS Semiconductor ML7831 IOH
  usb-storage: Accept 8020i-protocol commands longer than 12 bytes
  USB: quirks: adding more quirky webcams to avoid squeaky audio
  powerpc/usb: fix type cast for address of ioremap to compatible with 64-bit
  USB: at91: at91-ohci: fix set/get power
  USB: cdc-acm: Fix disconnect() vs close() race
  USB: add quirk for Logitech C600 web cam
  USB: EHCI: fix HUB TT scheduling issue with iso transfer
  USB: XHCI: resume root hubs when the controller resumes
  USB: workaround for bug in old version of GCC
  USB: ark3116 initialisation fix
  ...

Merge branch 'tty-linus' of git://git./linux/kernel/git/gregkh/tty

* 'tty-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
  TTY: ldisc, wait for ldisc infinitely in hangup
  TTY: ldisc, move wait idle to caller
  TTY: ldisc, allow waiting for ldisc arbitrarily long
  Revert "tty/serial: Prevent drop of DCD on suspend for Tegra UARTs"
  RS485: fix inconsistencies in the meaning of some variables
  pch_uart: Fix DMA resource leak issue
  serial,mfd: Fix CMSPAR setup
  tty/serial: Prevent drop of DCD on suspend for Tegra UARTs
  pch_uart: Change company name OKI SEMICONDUCTOR to LAPIS Semiconductor
  pch_uart: Support new device LAPIS Semiconductor ML7831 IOH
  pch_uart: Fix hw-flow control issue
  tty: hvc_dcc: Fix duplicate character inputs
  jsm: Change maintainership

Merge branch 'driver-core-linus' of git://git./linux/kernel/git/gregkh/driver-core

* 'driver-core-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
  drivers/base/node.c: fix compilation error with older versions of gcc
  uio: documentation fixups
  device.h: Fix struct member documentation

Merge branch 'char-misc-linus' of git://git./linux/kernel/git/gregkh/char-misc

* 'char-misc-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
  misc: ad525x_dpot: Fix AD8400 spi transfer size.
  pch_phub: Fix MAC address writing issue for LAPIS ML7831
  pch_phub: Improve ADE(Address Decode Enable) control
  pch_phub: Change company name OKI SEMICONDUCTOR to LAPIS Semiconductor
  pch_phub: Support new device LAPIS Semiconductor ML7831 IOH
  pcie-gadget-spear: Add "platform:" prefix for platform modalias
  MAINTAINERS: add CHAR and MISC driver maintainers

iio: fix a leak due to improper use of anon_inode_getfd()

it can fail and in that case ->release() will *not* be called...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

microblaze: bury asm/namei.h

altroot support has been gone for years, along with arch/*/asm/namei.h;
looks like a dummy survivor that sat it out in microblaze tree...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

usb_storage: don't use set_freezable_with_signal()

The current implementation of set_freezable_with_signal() is buggy and
tricky to get right.  usb-storage is the only user and its use can be
avoided trivially.

All usb-storage wants is to be able to sleep with timeout and get
woken up if freezing() becomes true.  This can be trivially
implemented by doing interruptible wait w/ freezing() included in the
wait condition.  There's no reason to use set_freezable_with_signal().

Perform interruptible wait on freezing() instead of using
set_freezable_with_signal(), which is scheduled for removal.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Seth Forshee <seth.forshee@canonical.com>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Greg Kroah-Hartman <gregkh@suse.de>

freezer: remove unused @sig_only from freeze_task()

After "freezer: make freezing() test freeze conditions in effect
instead of TIF_FREEZE", freezing() returns authoritative answer on
whether the current task should freeze or not and freeze_task()
doesn't need or use @sig_only. Remove it.

While at it, rewrite function comment for freeze_task() and rename
@sig_only to @user_only in try_to_freeze_tasks().

This patch doesn't cause any functional change.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Oleg Nesterov <oleg@redhat.com>

freezer: use lock_task_sighand() in fake_signal_wake_up()

cgroup_freezer calls freeze_task() without holding tasklist_lock and,
if the task is exiting, its ->sighand may be gone by the time
fake_signal_wake_up() is called. Use lock_task_sighand() instead of
accessing ->sighand directly.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Paul Menage <paul@paulmenage.org>

freezer: restructure __refrigerator()

If another freeze happens before all tasks leave FROZEN state after
being thawed, the freezer can see the existing FROZEN and consider the
tasks to be frozen but they can clear FROZEN without checking the new
freezing().

Oleg suggested restructuring __refrigerator() such that there's single
condition check section inside freezer_lock and sigpending is cleared
afterwards, which fixes the problem and simplifies the code.
Restructure accordingly.

-v2: Frozen loop exited without releasing freezer_lock. Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>

freezer: fix set_freezable[_with_signal]() race

A kthread doing set_freezable*() may race with on-going PM freeze and
the freezer might think all tasks are frozen while the new freezable
kthread is merrily proceeding to execute code paths which aren't
supposed to be executing during PM freeze.

Reimplement set_freezable[_with_signal]() using __set_freezable() such
that freezable PF flags are modified under freezer_lock and
try_to_freeze() is called afterwards. This eliminates race condition
against freezing.

Note: Separated out from larger patch to resolve fix order dependency
Oleg pointed out.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>

freezer: remove should_send_signal() and update frozen()

should_send_signal() is only used in freezer.c. Exporting them only
increases chance of abuse. Open code the two users and remove it.

Update frozen() to return bool.

Signed-off-by: Tejun Heo <tj@kernel.org>

freezer: remove now unused TIF_FREEZE

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: linux-arch@vger.kernel.org

freezer: make freezing() test freeze conditions in effect instead of TIF_FREEZE

Using TIF_FREEZE for freezing worked when there was only single
freezing condition (the PM one); however, now there is also the
cgroup_freezer and single bit flag is getting clumsy.
thaw_processes() is already testing whether cgroup freezing in in
effect to avoid thawing tasks which were frozen by both PM and cgroup
freezers.

This is racy (nothing prevents race against cgroup freezing) and
fragile.  A much simpler way is to test actual freeze conditions from
freezing() - ie. directly test whether PM or cgroup freezing is in
effect.

This patch adds variables to indicate whether and what type of
freezing conditions are in effect and reimplements freezing() such
that it directly tests whether any of the two freezing conditions is
active and the task should freeze.  On fast path, freezing() is still
very cheap - it only tests system_freezing_cnt.

This makes the clumsy dancing aroung TIF_FREEZE unnecessary and
freeze/thaw operations more usual - updating state variables for the
new state and nudging target tasks so that they notice the new state
and comply.  As long as the nudging happens after state update, it's
race-free.

* This allows use of freezing() in freeze_task().  Replace the open
  coded tests with freezing().

* p != current test is added to warning printing conditions in
  try_to_freeze_tasks() failure path.  This is necessary as freezing()
  is now true for the task which initiated freezing too.

-v2: Oleg pointed out that re-freezing FROZEN cgroup could increment
     system_freezing_cnt.  Fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Paul Menage <paul@paulmenage.org> (for the cgroup portions)

cgroup_freezer: prepare for removal of TIF_FREEZE

TIF_FREEZE will be removed soon and freezing() will directly test
whether any freezing condition is in effect.  Make the following
changes in preparation.

* Rename cgroup_freezing_or_frozen() to cgroup_freezing() and make it
  return bool.

* Make cgroup_freezing() access task_freezer() under rcu read lock
  instead of task_lock().  This makes the state dereferencing racy
  against task moving to another cgroup; however, it was already racy
  without this change as ->state dereference wasn't synchronized.
  This will be later dealt with using attach hooks.

* freezer->state is now set before trying to push tasks into the
  target state.

-v2: Oleg pointed out that freeze_change_state() was setting
     freeze->state incorrectly to CGROUP_FROZEN instead of
     CGROUP_FREEZING.  Fixed.

-v3: Matt pointed out that setting CGROUP_FROZEN used to always invoke
     try_to_freeze_cgroup() regardless of the current state.  Patch
     updated such that the actual freeze/thaw operations are always
     performed on invocation.  This shouldn't make any difference
     unless something is broken.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
Cc: Oleg Nesterov <oleg@redhat.com>

freezer: clean up freeze_processes() failure path

freeze_processes() failure path is rather messy.  Freezing is canceled
for workqueues and tasks which aren't frozen yet but frozen tasks are
left alone and should be thawed by the caller and of course some
callers (xen and kexec) didn't do it.

This patch updates __thaw_task() to handle cancelation correctly and
makes freeze_processes() and freeze_kernel_threads() call
thaw_processes() on failure instead so that the system is fully thawed
on failure.  Unnecessary [suspend_]thaw_processes() calls are removed
from kernel/power/hibernate.c, suspend.c and user.c.

While at it, restructure error checking if clause in suspend_prepare()
to be less weird.

-v2: Srivatsa spotted missing removal of suspend_thaw_processes() in
     suspend_prepare() and error in commit message.  Updated.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>

freezer: kill PF_FREEZING

With the previous changes, there's no meaningful difference between
PF_FREEZING and PF_FROZEN. Remove PF_FREEZING and use PF_FROZEN
instead in task_contributes_to_load().

Signed-off-by: Tejun Heo <tj@kernel.org>

freezer: test freezable conditions while holding freezer_lock

try_to_freeze_tasks() and thaw_processes() use freezable() and
frozen() as preliminary tests before initiating operations on a task.
These are done without any synchronization and hinder with
synchronization cleanup without any real performance benefits.

In try_to_freeze_tasks(), open code self test and move PF_NOFREEZE and
frozen() tests inside freezer_lock in freeze_task().

thaw_processes() can simply drop freezable() test as frozen() test in
__thaw_task() is enough.

Note: This used to be a part of larger patch to fix set_freezable()
race. Separated out to satisfy ordering among dependent fixes.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>

freezer: make freezing indicate freeze condition in effect

Currently freezing (TIF_FREEZE) and frozen (PF_FROZEN) states are
interlocked - freezing is set to request freeze and when the task
actually freezes, it clears freezing and sets frozen.

This interlocking makes things more complex than necessary - freezing
doesn't mean there's freezing condition in effect and frozen doesn't
match the task actually entering and leaving frozen state (it's
cleared by the thawing task).

This patch makes freezing indicate that freeze condition is in effect.
A task enters and stays frozen if freezing.  This makes PF_FROZEN
manipulation done only by the task itself and prevents wakeup from
__thaw_task() leaking outside of refrigerator.

The only place which needs to tell freezing && !frozen is
try_to_freeze_task() to whine about tasks which don't enter frozen.
It's updated to test the condition explicitly.

With the change, frozen() state my linger after __thaw_task() until
the task wakes up and exits fridge.  This can trigger BUG_ON() in
update_if_frozen().  Work it around by testing freezing() && frozen()
instead of frozen().

-v2: Oleg pointed out missing re-check of freezing() when trying to
     clear FROZEN and possible spurious BUG_ON() trigger in
     update_if_frozen().  Both fixed.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Menage <paul@paulmenage.org>

freezer: use dedicated lock instead of task_lock() + memory barrier

Freezer synchronization is needlessly complicated - it's by no means a
hot path and the priority is staying unintrusive and safe. This patch
makes it simply use a dedicated lock instead of piggy-backing on
task_lock() and playing with memory barriers.

On the failure path of try_to_freeze_tasks(), locking is moved from it
to cancel_freezing(). This makes the frozen() test racy but the race
here is a non-issue as the warning is printed for tasks which failed
to enter frozen for 20 seconds and race on PF_FROZEN at the last
moment doesn't change anything.

This simplifies freezer implementation and eases further changes
including some race fixes.

Signed-off-by: Tejun Heo <tj@kernel.org>

freezer: don't distinguish nosig tasks on thaw

There's no point in thawing nosig tasks before others. There's no
ordering requirement between the two groups on thaw, which the staged
thawing can't guarantee anyway. Simplify thaw_processes() by removing
the distinction and collapsing thaw_tasks() into thaw_processes().
This will help further updates to freezer.

Signed-off-by: Tejun Heo <tj@kernel.org>

freezer: remove racy clear_freeze_flag() and set PF_NOFREEZE on dead tasks

clear_freeze_flag() in exit_mm() is racy.  Freezing can start
afterwards.  Remove it.  Skipping freezer for exiting task will be
properly implemented later.

Also, freezable() was testing exit_state directly to make system
freezer ignore dead tasks.  Let the exiting task set PF_NOFREEZE after
entering TASK_DEAD instead.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>

freezer: rename thaw_process() to __thaw_task() and simplify the implementation

thaw_process() now has only internal users - system and cgroup
freezers.  Remove the unnecessary return value, rename, unexport and
collapse __thaw_process() into it.  This will help further updates to
the freezer code.

-v3: oom_kill grew a use of thaw_process() while this patch was
     pending.  Convert it to use __thaw_task() for now.  In the longer
     term, this should be handled by allowing tasks to die if killed
     even if it's frozen.

-v2: minor style update as suggested by Matt.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Paul Menage <menage@google.com>
Cc: Matt Helsley <matthltc@us.ibm.com>

freezer: implement and use kthread_freezable_should_stop()

Writeback and thinkpad_acpi have been using thaw_process() to prevent
deadlock between the freezer and kthread_stop(); unfortunately, this
is inherently racy - nothing prevents freezing from happening between
thaw_process() and kthread_stop().

This patch implements kthread_freezable_should_stop() which enters
refrigerator if necessary but is guaranteed to return if
kthread_stop() is invoked. Both thaw_process() users are converted to
use the new function.

Note that this deadlock condition exists for many of freezable
kthreads. They need to be converted to use the new should_stop or
freezable workqueue.

Tested with synthetic test case.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Henrique de Moraes Holschuh <ibm-acpi@hmh.eng.br>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>

freezer: unexport refrigerator() and update try_to_freeze() slightly

There is no reason to export two functions for entering the
refrigerator.  Calling refrigerator() instead of try_to_freeze()
doesn't save anything noticeable or removes any race condition.

* Rename refrigerator() to __refrigerator() and make it return bool
  indicating whether it scheduled out for freezing.

* Update try_to_freeze() to return bool and relay the return value of
  __refrigerator() if freezing().

* Convert all refrigerator() users to try_to_freeze().

* Update documentation accordingly.

* While at it, add might_sleep() to try_to_freeze().

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Samuel Ortiz <samuel@sortiz.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jan Kara <jack@suse.cz>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: Christoph Hellwig <hch@infradead.org>

freezer: don't unnecessarily set PF_NOFREEZE explicitly

Some drivers set PF_NOFREEZE in their kthread functions which is
completely unnecessary and racy - some part of freezer code doesn't
consider cases where PF_NOFREEZE is set asynchronous to freezer
operations.

In general, there's no reason to allow setting PF_NOFREEZE explicitly.
Remove them and change the documentation to note that setting
PF_NOFREEZE directly isn't allowed.

-v2: Dropped change to twl4030-irq.c as it no longer uses PF_NOFREEZE.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: "Gustavo F. Padovan" <padovan@profusion.mobi>
Acked-by: Samuel Ortiz <sameo@linux.intel.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: wwang <wei_wang@realsil.com.cn>

freezer: fix current->state restoration race in refrigerator()

refrigerator() saves current->state before entering frozen state and
restores it before returning using __set_current_state(); however,
this is racy, for example, please consider the following sequence.

set_current_state(TASK_INTERRUPTIBLE);
try_to_freeze();
if (kthread_should_stop())
break;
schedule();

If kthread_stop() races with ->state restoration, the restoration can
restore ->state to TASK_INTERRUPTIBLE after kthread_stop() sets it to
TASK_RUNNING but kthread_should_stop() may still see zero
->should_stop because there's no memory barrier between restoring
TASK_INTERRUPTIBLE and kthread_should_stop() test.

This isn't restricted to kthread_should_stop(). current->state is
often used in memory barrier based synchronization and silently
restoring it w/o mb breaks them.

Use set_current_state() instead.

Signed-off-by: Tejun Heo <tj@kernel.org>

Merge branch 'dev' of git://git./linux/kernel/git/tytso/ext4

* 'dev' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix up a undefined error in ext4_free_blocks in debugging code
  ext4: add blk_finish_plug in error case of writepages.
  ext4: Remove kernel_lock annotations
  ext4: ignore journalled data options on remount if fs has no journal

Merge branch 'for-linus' of git://git./linux/kernel/git/sage/ceph-client

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: Allocate larger oid buffer in request msgs
  ceph: initialize root dentry
  ceph: fix iput race when queueing inode work

Merge branch 'for-linus' of git://git./linux/kernel/git/viro/vfs

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
  VFS: Log the fact that we've given ELOOP rather than creating a loop
  minixfs: kill manual hweight(), simplify
  fs/minix: Verify bitmap block counts before mounting

fix braino in um patchset (mea culpa)

wrong register returned...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

Btrfs: remove free-space-cache.c WARN during log replay

The log replay code only partially loads block groups, since
the block group caching code is able to detect and deal with
extents the logging code has pinned down.

While the logging code is pinning down block groups, there is
a bogus WARN_ON we're hitting if the code wasn't able to find
an extent in the cache. This commit removes the warning because
it can happen any time there isn't a valid free space cache
for that block group.

Signed-off-by: Chris Mason <chris.mason@oracle.com>

ext4: fix up a undefined error in ext4_free_blocks in debugging code

sbi is not defined, so let ext4_free_blocks use EXT4_SB(sb) instead
when EXT4FS_DEBUG is defined.

Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>

VFS: Log the fact that we've given ELOOP rather than creating a loop

To prevent an NFS server from being used to create a directory loop in an NFS
superblock on the client, the following patch was committed:

commit 1836750115f20b774e55c032a3893e8c5bdf41ed
Author: Al Viro <viro@zeniv.linux.org.uk>
Date:   Tue Jul 12 21:42:24 2011 -0400
Subject: fix loop checks in d_materialise_unique()

This causes ELOOP to be reported to anyone trying to access the dentry that
would otherwise cause the kernel to complete the loop.

However, no indication is given to the caller as to why an operation that ought
to work doesn't.  The fault is with the kernel, which doesn't want to try and
solve the problem as it gets horrendously messy if there's another mountpoint
somewhere in the trees being spliced that can't be moved[*].

[*] The real problem is that we don't handle the excision of a subtree that
gets moved _out_ of what we can see.  This can happen on the server where a
directory is merely moved between two other dirs on the same filesystem, but
where destination dir is not accessible by the client.

So, given the choice to return ELOOP rather than trying to reconfigure the
dentry tree, we should give the caller some indication of why they aren't being
allowed to make what should be a legitimate request and log a message.

Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Sachin Prabhu <sprabhu@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>

Merge git://git./linux/kernel/git/davem/net

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (86 commits)
  ipv4: fix redirect handling
  ping: dont increment ICMP_MIB_INERRORS
  sky2: fix hang in napi_disable
  sky2: enforce minimum ring size
  bonding: Don't allow mode change via sysfs with slaves present
  f_phonet: fix page offset of first received fragment
  stmmac: fix pm functions avoiding sleep on spinlock
  stmmac: remove spin_lock in stmmac_ioctl.
  stmmac: parameters auto-tuning through HW cap reg
  stmmac: fix advertising 1000Base capabilties for non GMII iface
  stmmac: use mdelay on timeout of sw reset
  sky2: version 1.30
  sky2: used fixed RSS key
  sky2: reduce default Tx ring size
  sky2: rename up/down functions
  sky2: pci posting issues
  sky2: fix hang on shutdown (and other irq issues)
  r6040: fix check against MCRO_HASHEN bit in r6040_multicast_list
  MAINTAINERS: change email address for shemminger
  pch_gbe: Move #include of module.h
  ...

Merge branch 'kvm-updates/3.2' of git://git./virt/kvm/kvm

* 'kvm-updates/3.2' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
  KVM guest: prevent tracing recursion with kvmclock
  Revert "KVM: PPC: Add support for explicit HIOR setting"
  KVM: VMX: Check for automatic switch msr table overflow
  KVM: VMX: Add support for guest/host-only profiling
  KVM: VMX: add support for switching of PERF_GLOBAL_CTRL
  KVM: s390: announce SYNC_MMU
  KVM: s390: Fix tprot locking
  KVM: s390: handle SIGP sense running intercepts
  KVM: s390: Fix RUNNING flag misinterpretation

Merge branch 'fixes' of ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm

* 'fixes' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm:
  ARM: wire up process_vm_writev and process_vm_readv syscalls
  ARM: 7160/1: setup: avoid overflowing {elf,arch}_name from proc_info_list
  ARM: 7158/1: add new MFP implement for NUC900
  ARM: 7157/1: fix a building WARNING for nuc900
  ARM: 7156/1: l2x0: fix compile error on !CONFIG_USE_OF
  ARM: 7155/1: arch.h: Declare 'pt_regs' locally
  ARM: 7154/1: mach-bcmring: fix build error in dma.c
  ARM: 7153/1: mach-bcmring: fix build error in core.c
  ARM: 7152/1: distclean: Remove generated .dtb files
  ARM: 7150/1: Allow kernel unaligned accesses on ARMv6+ processors
  ARM: 7149/1: spi/pl022: Enable clock in probe
  Revert "ARM: 7098/1: kdump: copy kernel relocation code at the kexec prepare stage"

Merge branch 'pm-fixes' of git://git./linux/kernel/git/rafael/linux-pm

* 'pm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  PM / Suspend: Fix bug in suspend statistics update
  PM / Hibernate: Fix the early termination of test modes
  PM / shmobile: Fix build of sh7372_pm_init() for CONFIG_PM unset
  PM Sleep: Do not extend wakeup paths to devices with ignore_children set
  PM / driver core: disable device's runtime PM during shutdown
  PM / devfreq: correct Kconfig dependency
  PM / devfreq: fix use after free in devfreq_remove_device
  PM / shmobile: Avoid restoring the INTCS state during initialization
  PM / devfreq: Remove compiler error after irq.h update
  PM / QoS: Properly use the WARN() macro in dev_pm_qos_add_request()
  PM / Clocks: Only disable enabled clocks in pm_clk_suspend()
  ARM: mach-shmobile: sh7372 A3SP no_suspend_console fix
  PM / shmobile: Don't skip debugging output in pd_power_up()

Btrfs: sectorsize align offsets in fiemap

We've been hitting BUG()'s in btrfs_cont_expand and btrfs_fallocate and anywhere
else that calls btrfs_get_extent while running xfstests 13 in a loop.  This is
because fiemap is calling btrfs_get_extent with non-sectorsize aligned offsets,
which will end up adding mappings that are not sectorsize aligned, which will
cause problems in some cases for subsequent calls to btrfs_get_extent for
similar areas that are sectorsize aligned.  With this patch I ran xfstests 13 in
a loop for a couple of hours and didn't hit the problem that I could previously
hit in at most 20 minutes.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>

Btrfs: clear pages dirty for io and set them extent mapped

When doing the io_ctl helpers to clean up the free space cache stuff I stopped
using our normal prepare_pages stuff, which means I of course forgot to do
things like set the pages extent mapped, which will cause us all sorts of
wonderful propblems. Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>

Btrfs: wait on caching if we're loading the free space cache

We've been hitting panics when running xfstest 13 in a loop for long periods of
time.  And actually this problem has always existed so we've been hitting these
things randomly for a while.  Basically what happens is we get a thread coming
into the allocator and reading the space cache off of disk and adding the
entries to the free space cache as we go.  Then we get another thread that comes
in and tries to allocate from that block group.  Since block_group->cached !=
BTRFS_CACHE_NO it goes ahead and tries to do the allocation.  We do this because
if we're doing the old slow way of caching we don't want to hold people up and
wait for everything to finish.  The problem with this is we could end up
discarding the space cache at some arbitrary point in the future, which means we
could very well end up allocating space that is either bad, or when the real
caching happens it could end up thinking the space isn't in use when it really
is and cause all sorts of other problems.

The solution is to add a new flag to indicate we are loading the free space
cache from disk, and always try to cache the block group if cache->cached !=
BTRFS_CACHE_FINISHED.  That way if we are loading the space cache anybody else
who tries to allocate from the block group will have to wait until it's finished
to make sure it completes successfully.  Thanks,

Signed-off-by: Josef Bacik <josef@redhat.com>

Btrfs: prefix resize related printks with btrfs:

For the user it is confusing to find something like:
[10197.627710] new size for /dev/mapper/vg0-usr_share is 3221225472
in kernel log, because it doesn't point directly to btrfs.

This patch prefixes those messages with "btrfs:" like other btrfs
related printks.

Signed-off-by: Arnd Hannemann <arnd@arndnet.de>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

btrfs: fix stat blocks accounting

Round inode bytes and delalloc bytes up to real blocksize before
converting to sector size. Otherwise eg. files smaller than 512
are reported with zero blocks due to incorrect rounding.

Signed-off-by: David Sterba <dsterba@suse.cz>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: avoid unnecessary bitmap search for cluster setup

setup_cluster_no_bitmap() searches all the extents and bitmaps starting
from offset. Therefore if it returns -ENOSPC, all the bitmaps starting
from offset are in the bitmaps list, so it's sufficient to search from
this list in setup_cluser_bitmap().

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: fix to search one more bitmap for cluster setup

Suppose there are two bitmaps [0, 256], [256, 512] and one extent
[100, 120] in the free space cache, and we want to setup a cluster
with offset=100, bytes=50.

In this case, there will be only one bitmap [256, 512] in the temporary
bitmaps list, and then setup_cluster_bitmap() won't search bitmap [0, 256].

The cause is, the list is constructed in setup_cluster_no_bitmap(),
and only bitmaps with bitmap_entry->offset >= offset will be added
into the list, and the very bitmap that convers offset has
bitmap_entry->offset <= offset.

Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

btrfs: mirror_num should be int, not u64

My previous patch introduced some u64 for failed_mirror variables, this one
makes it consistent again.

Signed-off-by: Jan Schmidt <list.btrfs@jan-o-sch.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

btrfs: Fix up 32/64-bit compatibility for new ioctls

This patch casts to unsigned long before casting to a pointer and fixes
the following warnings:
fs/btrfs/extent_io.c:2289:20: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
fs/btrfs/ioctl.c:2933:37: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
fs/btrfs/ioctl.c:2937:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
fs/btrfs/ioctl.c:3020:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
fs/btrfs/scrub.c:275:4: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
fs/btrfs/backref.c:686:27: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]

Signed-off-by: Jeff Mahoney <jeffm@suse.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>

Btrfs: fix barrier flushes

When btrfs is writing the super blocks, it send barrier flushes to make
sure writeback caching drives get all the metadata on disk in the
right order.

But, we have two bugs in the way these are sent down.  When doing
full commits (not via the tree log), we are sending the barrier down
before the last super when it should be going down before the first.

In multi-device setups, we should be waiting for the barriers to
complete on all devices before writing any of the supers.

Both of these bugs can cause corruptions on power failures.  We fix it
with some new code to send down empty barriers to all devices before
writing the first super.

Alexandre Oliva found the multi-device bug.  Arne Jansen did the async
barrier loop.

Signed-off-by: Chris Mason <chris.mason@oracle.com>
Reported-by: Alexandre Oliva <oliva@lsd.ic.unicamp.br>