Documentation/memory-barriers.txt: Document ACCESS_ONCE()

author Paul E. McKenney <paulmck@linux.vnet.ibm.com>

Wed, 11 Dec 2013 21:59:07 +0000 (13:59 -0800)

committer Ingo Molnar <mingo@kernel.org>

Mon, 16 Dec 2013 10:36:12 +0000 (11:36 +0100)
author Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Wed, 11 Dec 2013 21:59:07 +0000 (13:59 -0800)
committer Ingo Molnar <mingo@kernel.org>
Mon, 16 Dec 2013 10:36:12 +0000 (11:36 +0100)
diff --git a/Documentation/memory-barriers.txt b/Documentation/memory-barriers.txt

index deafa36aeea1033e7bb460cf62a886aef611442d..919fd604969d68a3b84df01f2364c4db46d46ac2 100644 (file)
--- a/Documentation/memory-barriers.txt
+++ b/Documentation/memory-barriers.txt
@@ -231,37 +231,8 @@ And there are a number of things that _must_ or _must_not_ be assumed:
   (*) It _must_not_ be assumed that the compiler will do what you want with
       memory references that are not protected by ACCESS_ONCE().  Without
       ACCESS_ONCE(), the compiler is within its rights to do all sorts
-     of "creative" transformations:
-
-     (-) Repeat the load, possibly getting a different value on the second
-         and subsequent loads.  This is especially prone to happen when
-        register pressure is high.
-
-     (-) Merge adjacent loads and stores to the same location.  The most
-         familiar example is the transformation from:
-
-               while (a)
-                       do_something();
-
-         to something like:
-
-               if (a)
-                       for (;;)
-                               do_something();
-
-         Using ACCESS_ONCE() as follows prevents this sort of optimization:
-
-               while (ACCESS_ONCE(a))
-                       do_something();
-
-     (-) "Store tearing", where a single store in the source code is split
-         into smaller stores in the object code.  Note that gcc really
-        will do this on some architectures when storing certain constants.
-        It can be cheaper to do a series of immediate stores than to
-        form the constant in a register and then to store that register.
-
-     (-) "Load tearing", which splits loads in a manner analogous to
-        store tearing.
+     of "creative" transformations, which are covered in the Compiler
+     Barrier section.
  
   (*) It _must_not_ be assumed that independent loads and stores will be issued
       in the order given.  This means that for:
@@ -749,7 +720,8 @@ In summary:
  
    (*) Control dependencies require that the compiler avoid reordering the
        dependency into nonexistence.  Careful use of ACCESS_ONCE() or
-      barrier() can help to preserve your control dependency.
+      barrier() can help to preserve your control dependency.  Please
+      see the Compiler Barrier section for more information.
  
    (*) Control dependencies do -not- provide transitivity.  If you
        need transitivity, use smp_mb().
@@ -1248,12 +1220,276 @@ compiler from moving the memory accesses either side of it to the other side:
         barrier();
  
  This is a general barrier -- there are no read-read or write-write variants
-of barrier().  Howevever, ACCESS_ONCE() can be thought of as a weak form
+of barrier().  However, ACCESS_ONCE() can be thought of as a weak form
  for barrier() that affects only the specific accesses flagged by the
  ACCESS_ONCE().
  
-The compiler barrier has no direct effect on the CPU, which may then reorder
-things however it wishes.
+The barrier() function has the following effects:
+
+ (*) Prevents the compiler from reordering accesses following the
+     barrier() to precede any accesses preceding the barrier().
+     One example use for this property is to ease communication between
+     interrupt-handler code and the code that was interrupted.
+
+ (*) Within a loop, forces the compiler to load the variables used
+     in that loop's conditional on each pass through that loop.
+
+The ACCESS_ONCE() function can prevent any number of optimizations that,
+while perfectly safe in single-threaded code, can be fatal in concurrent
+code.  Here are some examples of these sorts of optimizations:
+
+ (*) The compiler is within its rights to merge successive loads from
+     the same variable.  Such merging can cause the compiler to "optimize"
+     the following code:
+
+       while (tmp = a)
+               do_something_with(tmp);
+
+     into the following code, which, although in some sense legitimate
+     for single-threaded code, is almost certainly not what the developer
+     intended:
+
+       if (tmp = a)
+               for (;;)
+                       do_something_with(tmp);
+
+     Use ACCESS_ONCE() to prevent the compiler from doing this to you:
+
+       while (tmp = ACCESS_ONCE(a))
+               do_something_with(tmp);
+
+ (*) The compiler is within its rights to reload a variable, for example,
+     in cases where high register pressure prevents the compiler from
+     keeping all data of interest in registers.  The compiler might
+     therefore optimize the variable 'tmp' out of our previous example:
+
+       while (tmp = a)
+               do_something_with(tmp);
+
+     This could result in the following code, which is perfectly safe in
+     single-threaded code, but can be fatal in concurrent code:
+
+       while (a)
+               do_something_with(a);
+
+     For example, the optimized version of this code could result in
+     passing a zero to do_something_with() in the case where the variable
+     a was modified by some other CPU between the "while" statement and
+     the call to do_something_with().
+
+     Again, use ACCESS_ONCE() to prevent the compiler from doing this:
+
+       while (tmp = ACCESS_ONCE(a))
+               do_something_with(tmp);
+
+     Note that if the compiler runs short of registers, it might save
+     tmp onto the stack.  The overhead of this saving and later restoring
+     is why compilers reload variables.  Doing so is perfectly safe for
+     single-threaded code, so you need to tell the compiler about cases
+     where it is not safe.
+
+ (*) The compiler is within its rights to omit a load entirely if it knows
+     what the value will be.  For example, if the compiler can prove that
+     the value of variable 'a' is always zero, it can optimize this code:
+
+       while (tmp = a)
+               do_something_with(tmp);
+
+     Into this:
+
+       do { } while (0);
+
+     This transformation is a win for single-threaded code because it gets
+     rid of a load and a branch.  The problem is that the compiler will
+     carry out its proof assuming that the current CPU is the only one
+     updating variable 'a'.  If variable 'a' is shared, then the compiler's
+     proof will be erroneous.  Use ACCESS_ONCE() to tell the compiler
+     that it doesn't know as much as it thinks it does:
+
+       while (tmp = ACCESS_ONCE(a))
+               do_something_with(tmp);
+
+     But please note that the compiler is also closely watching what you
+     do with the value after the ACCESS_ONCE().  For example, suppose you
+     do the following and MAX is a preprocessor macro with the value 1:
+
+       while ((tmp = ACCESS_ONCE(a)) % MAX)
+               do_something_with(tmp);
+
+     Then the compiler knows that the result of the "%" operator applied
+     to MAX will always be zero, again allowing the compiler to optimize
+     the code into near-nonexistence.  (It will still load from the
+     variable 'a'.)
+
+ (*) Similarly, the compiler is within its rights to omit a store entirely
+     if it knows that the variable already has the value being stored.
+     Again, the compiler assumes that the current CPU is the only one
+     storing into the variable, which can cause the compiler to do the
+     wrong thing for shared variables.  For example, suppose you have
+     the following:
+
+       a = 0;
+       /* Code that does not store to variable a. */
+       a = 0;
+
+     The compiler sees that the value of variable 'a' is already zero, so
+     it might well omit the second store.  This would come as a fatal
+     surprise if some other CPU might have stored to variable 'a' in the
+     meantime.
+
+     Use ACCESS_ONCE() to prevent the compiler from making this sort of
+     wrong guess:
+
+       ACCESS_ONCE(a) = 0;
+       /* Code that does not store to variable a. */
+       ACCESS_ONCE(a) = 0;
+
+ (*) The compiler is within its rights to reorder memory accesses unless
+     you tell it not to.  For example, consider the following interaction
+     between process-level code and an interrupt handler:
+
+       void process_level(void)
+       {
+               msg = get_message();
+               flag = true;
+       }
+
+       void interrupt_handler(void)
+       {
+               if (flag)
+                       process_message(msg);
+       }
+
+     There is nothing to prevent the the compiler from transforming
+     process_level() to the following, in fact, this might well be a
+     win for single-threaded code:
+
+       void process_level(void)
+       {
+               flag = true;
+               msg = get_message();
+       }
+
+     If the interrupt occurs between these two statement, then
+     interrupt_handler() might be passed a garbled msg.  Use ACCESS_ONCE()
+     to prevent this as follows:
+
+       void process_level(void)
+       {
+               ACCESS_ONCE(msg) = get_message();
+               ACCESS_ONCE(flag) = true;
+       }
+
+       void interrupt_handler(void)
+       {
+               if (ACCESS_ONCE(flag))
+                       process_message(ACCESS_ONCE(msg));
+       }
+
+     Note that the ACCESS_ONCE() wrappers in interrupt_handler()
+     are needed if this interrupt handler can itself be interrupted
+     by something that also accesses 'flag' and 'msg', for example,
+     a nested interrupt or an NMI.  Otherwise, ACCESS_ONCE() is not
+     needed in interrupt_handler() other than for documentation purposes.
+     (Note also that nested interrupts do not typically occur in modern
+     Linux kernels, in fact, if an interrupt handler returns with
+     interrupts enabled, you will get a WARN_ONCE() splat.)
+
+     You should assume that the compiler can move ACCESS_ONCE() past
+     code not containing ACCESS_ONCE(), barrier(), or similar primitives.
+
+     This effect could also be achieved using barrier(), but ACCESS_ONCE()
+     is more selective:  With ACCESS_ONCE(), the compiler need only forget
+     the contents of the indicated memory locations, while with barrier()
+     the compiler must discard the value of all memory locations that
+     it has currented cached in any machine registers.  Of course,
+     the compiler must also respect the order in which the ACCESS_ONCE()s
+     occur, though the CPU of course need not do so.
+
+ (*) The compiler is within its rights to invent stores to a variable,
+     as in the following example:
+
+       if (a)
+               b = a;
+       else
+               b = 42;
+
+     The compiler might save a branch by optimizing this as follows:
+
+       b = 42;
+       if (a)
+               b = a;
+
+     In single-threaded code, this is not only safe, but also saves
+     a branch.  Unfortunately, in concurrent code, this optimization
+     could cause some other CPU to see a spurious value of 42 -- even
+     if variable 'a' was never zero -- when loading variable 'b'.
+     Use ACCESS_ONCE() to prevent this as follows:
+
+       if (a)
+               ACCESS_ONCE(b) = a;
+       else
+               ACCESS_ONCE(b) = 42;
+
+     The compiler can also invent loads.  These are usually less
+     damaging, but they can result in cache-line bouncing and thus in
+     poor performance and scalability.  Use ACCESS_ONCE() to prevent
+     invented loads.
+
+ (*) For aligned memory locations whose size allows them to be accessed
+     with a single memory-reference instruction, prevents "load tearing"
+     and "store tearing," in which a single large access is replaced by
+     multiple smaller accesses.  For example, given an architecture having
+     16-bit store instructions with 7-bit immediate fields, the compiler
+     might be tempted to use two 16-bit store-immediate instructions to
+     implement the following 32-bit store:
+
+       p = 0x00010002;
+
+     Please note that GCC really does use this sort of optimization,
+     which is not surprising given that it would likely take more
+     than two instructions to build the constant and then store it.
+     This optimization can therefore be a win in single-threaded code.
+     In fact, a recent bug (since fixed) caused GCC to incorrectly use
+     this optimization in a volatile store.  In the absence of such bugs,
+     use of ACCESS_ONCE() prevents store tearing in the following example:
+
+       ACCESS_ONCE(p) = 0x00010002;
+
+     Use of packed structures can also result in load and store tearing,
+     as in this example:
+
+       struct __attribute__((__packed__)) foo {
+               short a;
+               int b;
+               short c;
+       };
+       struct foo foo1, foo2;
+       ...
+
+       foo2.a = foo1.a;
+       foo2.b = foo1.b;
+       foo2.c = foo1.c;
+
+     Because there are no ACCESS_ONCE() wrappers and no volatile markings,
+     the compiler would be well within its rights to implement these three
+     assignment statements as a pair of 32-bit loads followed by a pair
+     of 32-bit stores.  This would result in load tearing on 'foo1.b'
+     and store tearing on 'foo2.b'.  ACCESS_ONCE() again prevents tearing
+     in this example:
+
+       foo2.a = foo1.a;
+       ACCESS_ONCE(foo2.b) = ACCESS_ONCE(foo1.b);
+       foo2.c = foo1.c;
+
+All that aside, it is never necessary to use ACCESS_ONCE() on a variable
+that has been marked volatile.  For example, because 'jiffies' is marked
+volatile, it is never necessary to say ACCESS_ONCE(jiffies).  The reason
+for this is that ACCESS_ONCE() is implemented as a volatile cast, which
+has no effect when its argument is already marked volatile.
+
+Please note that these compiler barriers have no direct effect on the CPU,
+which may then reorder things however it wishes.
  
  
  CPU MEMORY BARRIERS
author	Paul E. McKenney <paulmck@linux.vnet.ibm.com>
	Wed, 11 Dec 2013 21:59:07 +0000 (13:59 -0800)
committer	Ingo Molnar <mingo@kernel.org>
	Mon, 16 Dec 2013 10:36:12 +0000 (11:36 +0100)