oom: fix race while temporarily setting current's oom_score_adj
authorDavid Rientjes <rientjes@google.com>
Tue, 1 Nov 2011 00:07:18 +0000 (17:07 -0700)
committerLinus Torvalds <torvalds@linux-foundation.org>
Tue, 1 Nov 2011 00:30:45 +0000 (17:30 -0700)
test_set_oom_score_adj() was introduced in 72788c385604 ("oom: replace
PF_OOM_ORIGIN with toggling oom_score_adj") to temporarily elevate
current's oom_score_adj for ksm and swapoff without requiring an
additional per-process flag.

Using that function to both set oom_score_adj to OOM_SCORE_ADJ_MAX and
then reinstate the previous value is racy since it's possible that
userspace can set the value to something else itself before the old value
is reinstated.  That results in userspace setting current's oom_score_adj
to a different value and then the kernel immediately setting it back to
its previous value without notification.

To fix this, a new compare_swap_oom_score_adj() function is introduced
with the same semantics as the compare and swap CAS instruction, or
CMPXCHG on x86.  It is used to reinstate the previous value of
oom_score_adj if and only if the present value is the same as the old
value.

Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ying Han <yinghan@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
include/linux/oom.h
mm/ksm.c
mm/oom_kill.c
mm/swapfile.c

index 13b7b02e599aed40fcf09c2eea96dc0f03dc3d32..6f9d04a8533606afa788c351909c5dba74e54438 100644 (file)
@@ -40,6 +40,7 @@ enum oom_constraint {
        CONSTRAINT_MEMCG,
 };
 
+extern void compare_swap_oom_score_adj(int old_val, int new_val);
 extern int test_set_oom_score_adj(int new_val);
 
 extern unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
index 9a68b0cf0a1c4c8009ee25d2990530d7e2927132..310544a379ae9c7b886b3b50815e5f3d5a991ba8 100644 (file)
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -1905,7 +1905,8 @@ static ssize_t run_store(struct kobject *kobj, struct kobj_attribute *attr,
 
                        oom_score_adj = test_set_oom_score_adj(OOM_SCORE_ADJ_MAX);
                        err = unmerge_and_remove_all_rmap_items();
-                       test_set_oom_score_adj(oom_score_adj);
+                       compare_swap_oom_score_adj(OOM_SCORE_ADJ_MAX,
+                                                               oom_score_adj);
                        if (err) {
                                ksm_run = KSM_RUN_STOP;
                                count = err;
index 2b97e8f04607b2ccb0011a3af5996aa41efe069b..e916168b6e0a8e6b70b4cb513f28591442ed8746 100644 (file)
@@ -39,6 +39,25 @@ int sysctl_oom_kill_allocating_task;
 int sysctl_oom_dump_tasks = 1;
 static DEFINE_SPINLOCK(zone_scan_lock);
 
+/*
+ * compare_swap_oom_score_adj() - compare and swap current's oom_score_adj
+ * @old_val: old oom_score_adj for compare
+ * @new_val: new oom_score_adj for swap
+ *
+ * Sets the oom_score_adj value for current to @new_val iff its present value is
+ * @old_val.  Usually used to reinstate a previous value to prevent racing with
+ * userspacing tuning the value in the interim.
+ */
+void compare_swap_oom_score_adj(int old_val, int new_val)
+{
+       struct sighand_struct *sighand = current->sighand;
+
+       spin_lock_irq(&sighand->siglock);
+       if (current->signal->oom_score_adj == old_val)
+               current->signal->oom_score_adj = new_val;
+       spin_unlock_irq(&sighand->siglock);
+}
+
 /**
  * test_set_oom_score_adj() - set current's oom_score_adj and return old value
  * @new_val: new oom_score_adj value
index 17bc224bce68c618285e9c74a3a2db0a4ec24c0f..c9d654009125d506053059262a63fe6e0637a394 100644 (file)
@@ -1617,7 +1617,7 @@ SYSCALL_DEFINE1(swapoff, const char __user *, specialfile)
 
        oom_score_adj = test_set_oom_score_adj(OOM_SCORE_ADJ_MAX);
        err = try_to_unuse(type);
-       test_set_oom_score_adj(oom_score_adj);
+       compare_swap_oom_score_adj(OOM_SCORE_ADJ_MAX, oom_score_adj);
 
        if (err) {
                /*