--- /dev/null
+.. _up_doc:
+
+RCU on Uniprocessor Systems
+===========================
+
+A common misconception is that, on UP systems, the call_rcu() primitive
+may immediately invoke its function. The basis of this misconception
+is that since there is only one CPU, it should not be necessary to
+wait for anything else to get done, since there are no other CPUs for
+anything else to be happening on. Although this approach will *sort of*
+work a surprising amount of the time, it is a very bad idea in general.
+This document presents three examples that demonstrate exactly how bad
+an idea this is.
+
+Example 1: softirq Suicide
+--------------------------
+
+Suppose that an RCU-based algorithm scans a linked list containing
+elements A, B, and C in process context, and can delete elements from
+this same list in softirq context. Suppose that the process-context scan
+is referencing element B when it is interrupted by softirq processing,
+which deletes element B, and then invokes call_rcu() to free element B
+after a grace period.
+
+Now, if call_rcu() were to directly invoke its arguments, then upon return
+from softirq, the list scan would find itself referencing a newly freed
+element B. This situation can greatly decrease the life expectancy of
+your kernel.
+
+This same problem can occur if call_rcu() is invoked from a hardware
+interrupt handler.
+
+Example 2: Function-Call Fatality
+---------------------------------
+
+Of course, one could avert the suicide described in the preceding example
+by having call_rcu() directly invoke its arguments only if it was called
+from process context. However, this can fail in a similar manner.
+
+Suppose that an RCU-based algorithm again scans a linked list containing
+elements A, B, and C in process contexts, but that it invokes a function
+on each element as it is scanned. Suppose further that this function
+deletes element B from the list, then passes it to call_rcu() for deferred
+freeing. This may be a bit unconventional, but it is perfectly legal
+RCU usage, since call_rcu() must wait for a grace period to elapse.
+Therefore, in this case, allowing call_rcu() to immediately invoke
+its arguments would cause it to fail to make the fundamental guarantee
+underlying RCU, namely that call_rcu() defers invoking its arguments until
+all RCU read-side critical sections currently executing have completed.
+
+Quick Quiz #1:
+ Why is it *not* legal to invoke synchronize_rcu() in this case?
+
+:ref:`Answers to Quick Quiz <answer_quick_quiz_up>`
+
+Example 3: Death by Deadlock
+----------------------------
+
+Suppose that call_rcu() is invoked while holding a lock, and that the
+callback function must acquire this same lock. In this case, if
+call_rcu() were to directly invoke the callback, the result would
+be self-deadlock.
+
+In some cases, it would possible to restructure to code so that
+the call_rcu() is delayed until after the lock is released. However,
+there are cases where this can be quite ugly:
+
+1. If a number of items need to be passed to call_rcu() within
+ the same critical section, then the code would need to create
+ a list of them, then traverse the list once the lock was
+ released.
+
+2. In some cases, the lock will be held across some kernel API,
+ so that delaying the call_rcu() until the lock is released
+ requires that the data item be passed up via a common API.
+ It is far better to guarantee that callbacks are invoked
+ with no locks held than to have to modify such APIs to allow
+ arbitrary data items to be passed back up through them.
+
+If call_rcu() directly invokes the callback, painful locking restrictions
+or API changes would be required.
+
+Quick Quiz #2:
+ What locking restriction must RCU callbacks respect?
+
+:ref:`Answers to Quick Quiz <answer_quick_quiz_up>`
+
+Summary
+-------
+
+Permitting call_rcu() to immediately invoke its arguments breaks RCU,
+even on a UP system. So do not do it! Even on a UP system, the RCU
+infrastructure *must* respect grace periods, and *must* invoke callbacks
+from a known environment in which no locks are held.
+
+Note that it *is* safe for synchronize_rcu() to return immediately on
+UP systems, including PREEMPT SMP builds running on UP systems.
+
+Quick Quiz #3:
+ Why can't synchronize_rcu() return immediately on UP systems running
+ preemptable RCU?
+
+.. _answer_quick_quiz_up:
+
+Answer to Quick Quiz #1:
+ Why is it *not* legal to invoke synchronize_rcu() in this case?
+
+ Because the calling function is scanning an RCU-protected linked
+ list, and is therefore within an RCU read-side critical section.
+ Therefore, the called function has been invoked within an RCU
+ read-side critical section, and is not permitted to block.
+
+Answer to Quick Quiz #2:
+ What locking restriction must RCU callbacks respect?
+
+ Any lock that is acquired within an RCU callback must be
+ acquired elsewhere using an _irq variant of the spinlock
+ primitive. For example, if "mylock" is acquired by an
+ RCU callback, then a process-context acquisition of this
+ lock must use something like spin_lock_irqsave() to
+ acquire the lock.
+
+ If the process-context code were to simply use spin_lock(),
+ then, since RCU callbacks can be invoked from softirq context,
+ the callback might be called from a softirq that interrupted
+ the process-context critical section. This would result in
+ self-deadlock.
+
+ This restriction might seem gratuitous, since very few RCU
+ callbacks acquire locks directly. However, a great many RCU
+ callbacks do acquire locks *indirectly*, for example, via
+ the kfree() primitive.
+
+Answer to Quick Quiz #3:
+ Why can't synchronize_rcu() return immediately on UP systems
+ running preemptable RCU?
+
+ Because some other task might have been preempted in the middle
+ of an RCU read-side critical section. If synchronize_rcu()
+ simply immediately returned, it would prematurely signal the
+ end of the grace period, which would come as a nasty shock to
+ that other thread when it started running again.
+++ /dev/null
-.. _up_doc:
-
-RCU on Uniprocessor Systems
-===========================
-
-A common misconception is that, on UP systems, the call_rcu() primitive
-may immediately invoke its function. The basis of this misconception
-is that since there is only one CPU, it should not be necessary to
-wait for anything else to get done, since there are no other CPUs for
-anything else to be happening on. Although this approach will *sort of*
-work a surprising amount of the time, it is a very bad idea in general.
-This document presents three examples that demonstrate exactly how bad
-an idea this is.
-
-Example 1: softirq Suicide
---------------------------
-
-Suppose that an RCU-based algorithm scans a linked list containing
-elements A, B, and C in process context, and can delete elements from
-this same list in softirq context. Suppose that the process-context scan
-is referencing element B when it is interrupted by softirq processing,
-which deletes element B, and then invokes call_rcu() to free element B
-after a grace period.
-
-Now, if call_rcu() were to directly invoke its arguments, then upon return
-from softirq, the list scan would find itself referencing a newly freed
-element B. This situation can greatly decrease the life expectancy of
-your kernel.
-
-This same problem can occur if call_rcu() is invoked from a hardware
-interrupt handler.
-
-Example 2: Function-Call Fatality
----------------------------------
-
-Of course, one could avert the suicide described in the preceding example
-by having call_rcu() directly invoke its arguments only if it was called
-from process context. However, this can fail in a similar manner.
-
-Suppose that an RCU-based algorithm again scans a linked list containing
-elements A, B, and C in process contexts, but that it invokes a function
-on each element as it is scanned. Suppose further that this function
-deletes element B from the list, then passes it to call_rcu() for deferred
-freeing. This may be a bit unconventional, but it is perfectly legal
-RCU usage, since call_rcu() must wait for a grace period to elapse.
-Therefore, in this case, allowing call_rcu() to immediately invoke
-its arguments would cause it to fail to make the fundamental guarantee
-underlying RCU, namely that call_rcu() defers invoking its arguments until
-all RCU read-side critical sections currently executing have completed.
-
-Quick Quiz #1:
- Why is it *not* legal to invoke synchronize_rcu() in this case?
-
-:ref:`Answers to Quick Quiz <answer_quick_quiz_up>`
-
-Example 3: Death by Deadlock
-----------------------------
-
-Suppose that call_rcu() is invoked while holding a lock, and that the
-callback function must acquire this same lock. In this case, if
-call_rcu() were to directly invoke the callback, the result would
-be self-deadlock.
-
-In some cases, it would possible to restructure to code so that
-the call_rcu() is delayed until after the lock is released. However,
-there are cases where this can be quite ugly:
-
-1. If a number of items need to be passed to call_rcu() within
- the same critical section, then the code would need to create
- a list of them, then traverse the list once the lock was
- released.
-
-2. In some cases, the lock will be held across some kernel API,
- so that delaying the call_rcu() until the lock is released
- requires that the data item be passed up via a common API.
- It is far better to guarantee that callbacks are invoked
- with no locks held than to have to modify such APIs to allow
- arbitrary data items to be passed back up through them.
-
-If call_rcu() directly invokes the callback, painful locking restrictions
-or API changes would be required.
-
-Quick Quiz #2:
- What locking restriction must RCU callbacks respect?
-
-:ref:`Answers to Quick Quiz <answer_quick_quiz_up>`
-
-Summary
--------
-
-Permitting call_rcu() to immediately invoke its arguments breaks RCU,
-even on a UP system. So do not do it! Even on a UP system, the RCU
-infrastructure *must* respect grace periods, and *must* invoke callbacks
-from a known environment in which no locks are held.
-
-Note that it *is* safe for synchronize_rcu() to return immediately on
-UP systems, including PREEMPT SMP builds running on UP systems.
-
-Quick Quiz #3:
- Why can't synchronize_rcu() return immediately on UP systems running
- preemptable RCU?
-
-.. _answer_quick_quiz_up:
-
-Answer to Quick Quiz #1:
- Why is it *not* legal to invoke synchronize_rcu() in this case?
-
- Because the calling function is scanning an RCU-protected linked
- list, and is therefore within an RCU read-side critical section.
- Therefore, the called function has been invoked within an RCU
- read-side critical section, and is not permitted to block.
-
-Answer to Quick Quiz #2:
- What locking restriction must RCU callbacks respect?
-
- Any lock that is acquired within an RCU callback must be
- acquired elsewhere using an _irq variant of the spinlock
- primitive. For example, if "mylock" is acquired by an
- RCU callback, then a process-context acquisition of this
- lock must use something like spin_lock_irqsave() to
- acquire the lock.
-
- If the process-context code were to simply use spin_lock(),
- then, since RCU callbacks can be invoked from softirq context,
- the callback might be called from a softirq that interrupted
- the process-context critical section. This would result in
- self-deadlock.
-
- This restriction might seem gratuitous, since very few RCU
- callbacks acquire locks directly. However, a great many RCU
- callbacks do acquire locks *indirectly*, for example, via
- the kfree() primitive.
-
-Answer to Quick Quiz #3:
- Why can't synchronize_rcu() return immediately on UP systems
- running preemptable RCU?
-
- Because some other task might have been preempted in the middle
- of an RCU read-side critical section. If synchronize_rcu()
- simply immediately returned, it would prematurely signal the
- end of the grace period, which would come as a nasty shock to
- that other thread when it started running again.
--- /dev/null
+.. _list_rcu_doc:
+
+Using RCU to Protect Read-Mostly Linked Lists
+=============================================
+
+One of the best applications of RCU is to protect read-mostly linked lists
+("struct list_head" in list.h). One big advantage of this approach
+is that all of the required memory barriers are included for you in
+the list macros. This document describes several applications of RCU,
+with the best fits first.
+
+Example 1: Read-Side Action Taken Outside of Lock, No In-Place Updates
+----------------------------------------------------------------------
+
+The best applications are cases where, if reader-writer locking were
+used, the read-side lock would be dropped before taking any action
+based on the results of the search. The most celebrated example is
+the routing table. Because the routing table is tracking the state of
+equipment outside of the computer, it will at times contain stale data.
+Therefore, once the route has been computed, there is no need to hold
+the routing table static during transmission of the packet. After all,
+you can hold the routing table static all you want, but that won't keep
+the external Internet from changing, and it is the state of the external
+Internet that really matters. In addition, routing entries are typically
+added or deleted, rather than being modified in place.
+
+A straightforward example of this use of RCU may be found in the
+system-call auditing support. For example, a reader-writer locked
+implementation of audit_filter_task() might be as follows::
+
+ static enum audit_state audit_filter_task(struct task_struct *tsk)
+ {
+ struct audit_entry *e;
+ enum audit_state state;
+
+ read_lock(&auditsc_lock);
+ /* Note: audit_netlink_sem held by caller. */
+ list_for_each_entry(e, &audit_tsklist, list) {
+ if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
+ read_unlock(&auditsc_lock);
+ return state;
+ }
+ }
+ read_unlock(&auditsc_lock);
+ return AUDIT_BUILD_CONTEXT;
+ }
+
+Here the list is searched under the lock, but the lock is dropped before
+the corresponding value is returned. By the time that this value is acted
+on, the list may well have been modified. This makes sense, since if
+you are turning auditing off, it is OK to audit a few extra system calls.
+
+This means that RCU can be easily applied to the read side, as follows::
+
+ static enum audit_state audit_filter_task(struct task_struct *tsk)
+ {
+ struct audit_entry *e;
+ enum audit_state state;
+
+ rcu_read_lock();
+ /* Note: audit_netlink_sem held by caller. */
+ list_for_each_entry_rcu(e, &audit_tsklist, list) {
+ if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
+ rcu_read_unlock();
+ return state;
+ }
+ }
+ rcu_read_unlock();
+ return AUDIT_BUILD_CONTEXT;
+ }
+
+The read_lock() and read_unlock() calls have become rcu_read_lock()
+and rcu_read_unlock(), respectively, and the list_for_each_entry() has
+become list_for_each_entry_rcu(). The _rcu() list-traversal primitives
+insert the read-side memory barriers that are required on DEC Alpha CPUs.
+
+The changes to the update side are also straightforward. A reader-writer
+lock might be used as follows for deletion and insertion::
+
+ static inline int audit_del_rule(struct audit_rule *rule,
+ struct list_head *list)
+ {
+ struct audit_entry *e;
+
+ write_lock(&auditsc_lock);
+ list_for_each_entry(e, list, list) {
+ if (!audit_compare_rule(rule, &e->rule)) {
+ list_del(&e->list);
+ write_unlock(&auditsc_lock);
+ return 0;
+ }
+ }
+ write_unlock(&auditsc_lock);
+ return -EFAULT; /* No matching rule */
+ }
+
+ static inline int audit_add_rule(struct audit_entry *entry,
+ struct list_head *list)
+ {
+ write_lock(&auditsc_lock);
+ if (entry->rule.flags & AUDIT_PREPEND) {
+ entry->rule.flags &= ~AUDIT_PREPEND;
+ list_add(&entry->list, list);
+ } else {
+ list_add_tail(&entry->list, list);
+ }
+ write_unlock(&auditsc_lock);
+ return 0;
+ }
+
+Following are the RCU equivalents for these two functions::
+
+ static inline int audit_del_rule(struct audit_rule *rule,
+ struct list_head *list)
+ {
+ struct audit_entry *e;
+
+ /* Do not use the _rcu iterator here, since this is the only
+ * deletion routine. */
+ list_for_each_entry(e, list, list) {
+ if (!audit_compare_rule(rule, &e->rule)) {
+ list_del_rcu(&e->list);
+ call_rcu(&e->rcu, audit_free_rule);
+ return 0;
+ }
+ }
+ return -EFAULT; /* No matching rule */
+ }
+
+ static inline int audit_add_rule(struct audit_entry *entry,
+ struct list_head *list)
+ {
+ if (entry->rule.flags & AUDIT_PREPEND) {
+ entry->rule.flags &= ~AUDIT_PREPEND;
+ list_add_rcu(&entry->list, list);
+ } else {
+ list_add_tail_rcu(&entry->list, list);
+ }
+ return 0;
+ }
+
+Normally, the write_lock() and write_unlock() would be replaced by
+a spin_lock() and a spin_unlock(), but in this case, all callers hold
+audit_netlink_sem, so no additional locking is required. The auditsc_lock
+can therefore be eliminated, since use of RCU eliminates the need for
+writers to exclude readers. Normally, the write_lock() calls would
+be converted into spin_lock() calls.
+
+The list_del(), list_add(), and list_add_tail() primitives have been
+replaced by list_del_rcu(), list_add_rcu(), and list_add_tail_rcu().
+The _rcu() list-manipulation primitives add memory barriers that are
+needed on weakly ordered CPUs (most of them!). The list_del_rcu()
+primitive omits the pointer poisoning debug-assist code that would
+otherwise cause concurrent readers to fail spectacularly.
+
+So, when readers can tolerate stale data and when entries are either added
+or deleted, without in-place modification, it is very easy to use RCU!
+
+Example 2: Handling In-Place Updates
+------------------------------------
+
+The system-call auditing code does not update auditing rules in place.
+However, if it did, reader-writer-locked code to do so might look as
+follows (presumably, the field_count is only permitted to decrease,
+otherwise, the added fields would need to be filled in)::
+
+ static inline int audit_upd_rule(struct audit_rule *rule,
+ struct list_head *list,
+ __u32 newaction,
+ __u32 newfield_count)
+ {
+ struct audit_entry *e;
+ struct audit_newentry *ne;
+
+ write_lock(&auditsc_lock);
+ /* Note: audit_netlink_sem held by caller. */
+ list_for_each_entry(e, list, list) {
+ if (!audit_compare_rule(rule, &e->rule)) {
+ e->rule.action = newaction;
+ e->rule.file_count = newfield_count;
+ write_unlock(&auditsc_lock);
+ return 0;
+ }
+ }
+ write_unlock(&auditsc_lock);
+ return -EFAULT; /* No matching rule */
+ }
+
+The RCU version creates a copy, updates the copy, then replaces the old
+entry with the newly updated entry. This sequence of actions, allowing
+concurrent reads while doing a copy to perform an update, is what gives
+RCU ("read-copy update") its name. The RCU code is as follows::
+
+ static inline int audit_upd_rule(struct audit_rule *rule,
+ struct list_head *list,
+ __u32 newaction,
+ __u32 newfield_count)
+ {
+ struct audit_entry *e;
+ struct audit_newentry *ne;
+
+ list_for_each_entry(e, list, list) {
+ if (!audit_compare_rule(rule, &e->rule)) {
+ ne = kmalloc(sizeof(*entry), GFP_ATOMIC);
+ if (ne == NULL)
+ return -ENOMEM;
+ audit_copy_rule(&ne->rule, &e->rule);
+ ne->rule.action = newaction;
+ ne->rule.file_count = newfield_count;
+ list_replace_rcu(&e->list, &ne->list);
+ call_rcu(&e->rcu, audit_free_rule);
+ return 0;
+ }
+ }
+ return -EFAULT; /* No matching rule */
+ }
+
+Again, this assumes that the caller holds audit_netlink_sem. Normally,
+the reader-writer lock would become a spinlock in this sort of code.
+
+Example 3: Eliminating Stale Data
+---------------------------------
+
+The auditing examples above tolerate stale data, as do most algorithms
+that are tracking external state. Because there is a delay from the
+time the external state changes before Linux becomes aware of the change,
+additional RCU-induced staleness is normally not a problem.
+
+However, there are many examples where stale data cannot be tolerated.
+One example in the Linux kernel is the System V IPC (see the ipc_lock()
+function in ipc/util.c). This code checks a "deleted" flag under a
+per-entry spinlock, and, if the "deleted" flag is set, pretends that the
+entry does not exist. For this to be helpful, the search function must
+return holding the per-entry spinlock, as ipc_lock() does in fact do.
+
+Quick Quiz:
+ Why does the search function need to return holding the per-entry lock for
+ this deleted-flag technique to be helpful?
+
+:ref:`Answer to Quick Quiz <answer_quick_quiz_list>`
+
+If the system-call audit module were to ever need to reject stale data,
+one way to accomplish this would be to add a "deleted" flag and a "lock"
+spinlock to the audit_entry structure, and modify audit_filter_task()
+as follows::
+
+ static enum audit_state audit_filter_task(struct task_struct *tsk)
+ {
+ struct audit_entry *e;
+ enum audit_state state;
+
+ rcu_read_lock();
+ list_for_each_entry_rcu(e, &audit_tsklist, list) {
+ if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
+ spin_lock(&e->lock);
+ if (e->deleted) {
+ spin_unlock(&e->lock);
+ rcu_read_unlock();
+ return AUDIT_BUILD_CONTEXT;
+ }
+ rcu_read_unlock();
+ return state;
+ }
+ }
+ rcu_read_unlock();
+ return AUDIT_BUILD_CONTEXT;
+ }
+
+Note that this example assumes that entries are only added and deleted.
+Additional mechanism is required to deal correctly with the
+update-in-place performed by audit_upd_rule(). For one thing,
+audit_upd_rule() would need additional memory barriers to ensure
+that the list_add_rcu() was really executed before the list_del_rcu().
+
+The audit_del_rule() function would need to set the "deleted"
+flag under the spinlock as follows::
+
+ static inline int audit_del_rule(struct audit_rule *rule,
+ struct list_head *list)
+ {
+ struct audit_entry *e;
+
+ /* Do not need to use the _rcu iterator here, since this
+ * is the only deletion routine. */
+ list_for_each_entry(e, list, list) {
+ if (!audit_compare_rule(rule, &e->rule)) {
+ spin_lock(&e->lock);
+ list_del_rcu(&e->list);
+ e->deleted = 1;
+ spin_unlock(&e->lock);
+ call_rcu(&e->rcu, audit_free_rule);
+ return 0;
+ }
+ }
+ return -EFAULT; /* No matching rule */
+ }
+
+Summary
+-------
+
+Read-mostly list-based data structures that can tolerate stale data are
+the most amenable to use of RCU. The simplest case is where entries are
+either added or deleted from the data structure (or atomically modified
+in place), but non-atomic in-place modifications can be handled by making
+a copy, updating the copy, then replacing the original with the copy.
+If stale data cannot be tolerated, then a "deleted" flag may be used
+in conjunction with a per-entry spinlock in order to allow the search
+function to reject newly deleted data.
+
+.. _answer_quick_quiz_list:
+
+Answer to Quick Quiz:
+ Why does the search function need to return holding the per-entry
+ lock for this deleted-flag technique to be helpful?
+
+ If the search function drops the per-entry lock before returning,
+ then the caller will be processing stale data in any case. If it
+ is really OK to be processing stale data, then you don't need a
+ "deleted" flag. If processing stale data really is a problem,
+ then you need to hold the per-entry lock across all of the code
+ that uses the value that was returned.
+++ /dev/null
-.. _list_rcu_doc:
-
-Using RCU to Protect Read-Mostly Linked Lists
-=============================================
-
-One of the best applications of RCU is to protect read-mostly linked lists
-("struct list_head" in list.h). One big advantage of this approach
-is that all of the required memory barriers are included for you in
-the list macros. This document describes several applications of RCU,
-with the best fits first.
-
-Example 1: Read-Side Action Taken Outside of Lock, No In-Place Updates
-----------------------------------------------------------------------
-
-The best applications are cases where, if reader-writer locking were
-used, the read-side lock would be dropped before taking any action
-based on the results of the search. The most celebrated example is
-the routing table. Because the routing table is tracking the state of
-equipment outside of the computer, it will at times contain stale data.
-Therefore, once the route has been computed, there is no need to hold
-the routing table static during transmission of the packet. After all,
-you can hold the routing table static all you want, but that won't keep
-the external Internet from changing, and it is the state of the external
-Internet that really matters. In addition, routing entries are typically
-added or deleted, rather than being modified in place.
-
-A straightforward example of this use of RCU may be found in the
-system-call auditing support. For example, a reader-writer locked
-implementation of audit_filter_task() might be as follows::
-
- static enum audit_state audit_filter_task(struct task_struct *tsk)
- {
- struct audit_entry *e;
- enum audit_state state;
-
- read_lock(&auditsc_lock);
- /* Note: audit_netlink_sem held by caller. */
- list_for_each_entry(e, &audit_tsklist, list) {
- if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
- read_unlock(&auditsc_lock);
- return state;
- }
- }
- read_unlock(&auditsc_lock);
- return AUDIT_BUILD_CONTEXT;
- }
-
-Here the list is searched under the lock, but the lock is dropped before
-the corresponding value is returned. By the time that this value is acted
-on, the list may well have been modified. This makes sense, since if
-you are turning auditing off, it is OK to audit a few extra system calls.
-
-This means that RCU can be easily applied to the read side, as follows::
-
- static enum audit_state audit_filter_task(struct task_struct *tsk)
- {
- struct audit_entry *e;
- enum audit_state state;
-
- rcu_read_lock();
- /* Note: audit_netlink_sem held by caller. */
- list_for_each_entry_rcu(e, &audit_tsklist, list) {
- if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
- rcu_read_unlock();
- return state;
- }
- }
- rcu_read_unlock();
- return AUDIT_BUILD_CONTEXT;
- }
-
-The read_lock() and read_unlock() calls have become rcu_read_lock()
-and rcu_read_unlock(), respectively, and the list_for_each_entry() has
-become list_for_each_entry_rcu(). The _rcu() list-traversal primitives
-insert the read-side memory barriers that are required on DEC Alpha CPUs.
-
-The changes to the update side are also straightforward. A reader-writer
-lock might be used as follows for deletion and insertion::
-
- static inline int audit_del_rule(struct audit_rule *rule,
- struct list_head *list)
- {
- struct audit_entry *e;
-
- write_lock(&auditsc_lock);
- list_for_each_entry(e, list, list) {
- if (!audit_compare_rule(rule, &e->rule)) {
- list_del(&e->list);
- write_unlock(&auditsc_lock);
- return 0;
- }
- }
- write_unlock(&auditsc_lock);
- return -EFAULT; /* No matching rule */
- }
-
- static inline int audit_add_rule(struct audit_entry *entry,
- struct list_head *list)
- {
- write_lock(&auditsc_lock);
- if (entry->rule.flags & AUDIT_PREPEND) {
- entry->rule.flags &= ~AUDIT_PREPEND;
- list_add(&entry->list, list);
- } else {
- list_add_tail(&entry->list, list);
- }
- write_unlock(&auditsc_lock);
- return 0;
- }
-
-Following are the RCU equivalents for these two functions::
-
- static inline int audit_del_rule(struct audit_rule *rule,
- struct list_head *list)
- {
- struct audit_entry *e;
-
- /* Do not use the _rcu iterator here, since this is the only
- * deletion routine. */
- list_for_each_entry(e, list, list) {
- if (!audit_compare_rule(rule, &e->rule)) {
- list_del_rcu(&e->list);
- call_rcu(&e->rcu, audit_free_rule);
- return 0;
- }
- }
- return -EFAULT; /* No matching rule */
- }
-
- static inline int audit_add_rule(struct audit_entry *entry,
- struct list_head *list)
- {
- if (entry->rule.flags & AUDIT_PREPEND) {
- entry->rule.flags &= ~AUDIT_PREPEND;
- list_add_rcu(&entry->list, list);
- } else {
- list_add_tail_rcu(&entry->list, list);
- }
- return 0;
- }
-
-Normally, the write_lock() and write_unlock() would be replaced by
-a spin_lock() and a spin_unlock(), but in this case, all callers hold
-audit_netlink_sem, so no additional locking is required. The auditsc_lock
-can therefore be eliminated, since use of RCU eliminates the need for
-writers to exclude readers. Normally, the write_lock() calls would
-be converted into spin_lock() calls.
-
-The list_del(), list_add(), and list_add_tail() primitives have been
-replaced by list_del_rcu(), list_add_rcu(), and list_add_tail_rcu().
-The _rcu() list-manipulation primitives add memory barriers that are
-needed on weakly ordered CPUs (most of them!). The list_del_rcu()
-primitive omits the pointer poisoning debug-assist code that would
-otherwise cause concurrent readers to fail spectacularly.
-
-So, when readers can tolerate stale data and when entries are either added
-or deleted, without in-place modification, it is very easy to use RCU!
-
-Example 2: Handling In-Place Updates
-------------------------------------
-
-The system-call auditing code does not update auditing rules in place.
-However, if it did, reader-writer-locked code to do so might look as
-follows (presumably, the field_count is only permitted to decrease,
-otherwise, the added fields would need to be filled in)::
-
- static inline int audit_upd_rule(struct audit_rule *rule,
- struct list_head *list,
- __u32 newaction,
- __u32 newfield_count)
- {
- struct audit_entry *e;
- struct audit_newentry *ne;
-
- write_lock(&auditsc_lock);
- /* Note: audit_netlink_sem held by caller. */
- list_for_each_entry(e, list, list) {
- if (!audit_compare_rule(rule, &e->rule)) {
- e->rule.action = newaction;
- e->rule.file_count = newfield_count;
- write_unlock(&auditsc_lock);
- return 0;
- }
- }
- write_unlock(&auditsc_lock);
- return -EFAULT; /* No matching rule */
- }
-
-The RCU version creates a copy, updates the copy, then replaces the old
-entry with the newly updated entry. This sequence of actions, allowing
-concurrent reads while doing a copy to perform an update, is what gives
-RCU ("read-copy update") its name. The RCU code is as follows::
-
- static inline int audit_upd_rule(struct audit_rule *rule,
- struct list_head *list,
- __u32 newaction,
- __u32 newfield_count)
- {
- struct audit_entry *e;
- struct audit_newentry *ne;
-
- list_for_each_entry(e, list, list) {
- if (!audit_compare_rule(rule, &e->rule)) {
- ne = kmalloc(sizeof(*entry), GFP_ATOMIC);
- if (ne == NULL)
- return -ENOMEM;
- audit_copy_rule(&ne->rule, &e->rule);
- ne->rule.action = newaction;
- ne->rule.file_count = newfield_count;
- list_replace_rcu(&e->list, &ne->list);
- call_rcu(&e->rcu, audit_free_rule);
- return 0;
- }
- }
- return -EFAULT; /* No matching rule */
- }
-
-Again, this assumes that the caller holds audit_netlink_sem. Normally,
-the reader-writer lock would become a spinlock in this sort of code.
-
-Example 3: Eliminating Stale Data
----------------------------------
-
-The auditing examples above tolerate stale data, as do most algorithms
-that are tracking external state. Because there is a delay from the
-time the external state changes before Linux becomes aware of the change,
-additional RCU-induced staleness is normally not a problem.
-
-However, there are many examples where stale data cannot be tolerated.
-One example in the Linux kernel is the System V IPC (see the ipc_lock()
-function in ipc/util.c). This code checks a "deleted" flag under a
-per-entry spinlock, and, if the "deleted" flag is set, pretends that the
-entry does not exist. For this to be helpful, the search function must
-return holding the per-entry spinlock, as ipc_lock() does in fact do.
-
-Quick Quiz:
- Why does the search function need to return holding the per-entry lock for
- this deleted-flag technique to be helpful?
-
-:ref:`Answer to Quick Quiz <answer_quick_quiz_list>`
-
-If the system-call audit module were to ever need to reject stale data,
-one way to accomplish this would be to add a "deleted" flag and a "lock"
-spinlock to the audit_entry structure, and modify audit_filter_task()
-as follows::
-
- static enum audit_state audit_filter_task(struct task_struct *tsk)
- {
- struct audit_entry *e;
- enum audit_state state;
-
- rcu_read_lock();
- list_for_each_entry_rcu(e, &audit_tsklist, list) {
- if (audit_filter_rules(tsk, &e->rule, NULL, &state)) {
- spin_lock(&e->lock);
- if (e->deleted) {
- spin_unlock(&e->lock);
- rcu_read_unlock();
- return AUDIT_BUILD_CONTEXT;
- }
- rcu_read_unlock();
- return state;
- }
- }
- rcu_read_unlock();
- return AUDIT_BUILD_CONTEXT;
- }
-
-Note that this example assumes that entries are only added and deleted.
-Additional mechanism is required to deal correctly with the
-update-in-place performed by audit_upd_rule(). For one thing,
-audit_upd_rule() would need additional memory barriers to ensure
-that the list_add_rcu() was really executed before the list_del_rcu().
-
-The audit_del_rule() function would need to set the "deleted"
-flag under the spinlock as follows::
-
- static inline int audit_del_rule(struct audit_rule *rule,
- struct list_head *list)
- {
- struct audit_entry *e;
-
- /* Do not need to use the _rcu iterator here, since this
- * is the only deletion routine. */
- list_for_each_entry(e, list, list) {
- if (!audit_compare_rule(rule, &e->rule)) {
- spin_lock(&e->lock);
- list_del_rcu(&e->list);
- e->deleted = 1;
- spin_unlock(&e->lock);
- call_rcu(&e->rcu, audit_free_rule);
- return 0;
- }
- }
- return -EFAULT; /* No matching rule */
- }
-
-Summary
--------
-
-Read-mostly list-based data structures that can tolerate stale data are
-the most amenable to use of RCU. The simplest case is where entries are
-either added or deleted from the data structure (or atomically modified
-in place), but non-atomic in-place modifications can be handled by making
-a copy, updating the copy, then replacing the original with the copy.
-If stale data cannot be tolerated, then a "deleted" flag may be used
-in conjunction with a per-entry spinlock in order to allow the search
-function to reject newly deleted data.
-
-.. _answer_quick_quiz_list:
-
-Answer to Quick Quiz:
- Why does the search function need to return holding the per-entry
- lock for this deleted-flag technique to be helpful?
-
- If the search function drops the per-entry lock before returning,
- then the caller will be processing stale data in any case. If it
- is really OK to be processing stale data, then you don't need a
- "deleted" flag. If processing stale data really is a problem,
- then you need to hold the per-entry lock across all of the code
- that uses the value that was returned.
--- /dev/null
+.. _rcu_doc:
+
+RCU Concepts
+============
+
+The basic idea behind RCU (read-copy update) is to split destructive
+operations into two parts, one that prevents anyone from seeing the data
+item being destroyed, and one that actually carries out the destruction.
+A "grace period" must elapse between the two parts, and this grace period
+must be long enough that any readers accessing the item being deleted have
+since dropped their references. For example, an RCU-protected deletion
+from a linked list would first remove the item from the list, wait for
+a grace period to elapse, then free the element. See the
+Documentation/RCU/listRCU.rst file for more information on using RCU with
+linked lists.
+
+Frequently Asked Questions
+--------------------------
+
+- Why would anyone want to use RCU?
+
+ The advantage of RCU's two-part approach is that RCU readers need
+ not acquire any locks, perform any atomic instructions, write to
+ shared memory, or (on CPUs other than Alpha) execute any memory
+ barriers. The fact that these operations are quite expensive
+ on modern CPUs is what gives RCU its performance advantages
+ in read-mostly situations. The fact that RCU readers need not
+ acquire locks can also greatly simplify deadlock-avoidance code.
+
+- How can the updater tell when a grace period has completed
+ if the RCU readers give no indication when they are done?
+
+ Just as with spinlocks, RCU readers are not permitted to
+ block, switch to user-mode execution, or enter the idle loop.
+ Therefore, as soon as a CPU is seen passing through any of these
+ three states, we know that that CPU has exited any previous RCU
+ read-side critical sections. So, if we remove an item from a
+ linked list, and then wait until all CPUs have switched context,
+ executed in user mode, or executed in the idle loop, we can
+ safely free up that item.
+
+ Preemptible variants of RCU (CONFIG_PREEMPT_RCU) get the
+ same effect, but require that the readers manipulate CPU-local
+ counters. These counters allow limited types of blocking within
+ RCU read-side critical sections. SRCU also uses CPU-local
+ counters, and permits general blocking within RCU read-side
+ critical sections. These variants of RCU detect grace periods
+ by sampling these counters.
+
+- If I am running on a uniprocessor kernel, which can only do one
+ thing at a time, why should I wait for a grace period?
+
+ See the Documentation/RCU/UP.rst file for more information.
+
+- How can I see where RCU is currently used in the Linux kernel?
+
+ Search for "rcu_read_lock", "rcu_read_unlock", "call_rcu",
+ "rcu_read_lock_bh", "rcu_read_unlock_bh", "srcu_read_lock",
+ "srcu_read_unlock", "synchronize_rcu", "synchronize_net",
+ "synchronize_srcu", and the other RCU primitives. Or grab one
+ of the cscope databases from:
+
+ (http://www.rdrop.com/users/paulmck/RCU/linuxusage/rculocktab.html).
+
+- What guidelines should I follow when writing code that uses RCU?
+
+ See the checklist.txt file in this directory.
+
+- Why the name "RCU"?
+
+ "RCU" stands for "read-copy update". The file Documentation/RCU/listRCU.rst
+ has more information on where this name came from, search for
+ "read-copy update" to find it.
+
+- I hear that RCU is patented? What is with that?
+
+ Yes, it is. There are several known patents related to RCU,
+ search for the string "Patent" in RTFP.txt to find them.
+ Of these, one was allowed to lapse by the assignee, and the
+ others have been contributed to the Linux kernel under GPL.
+ There are now also LGPL implementations of user-level RCU
+ available (http://liburcu.org/).
+
+- I hear that RCU needs work in order to support realtime kernels?
+
+ Realtime-friendly RCU can be enabled via the CONFIG_PREEMPT_RCU
+ kernel configuration parameter.
+
+- Where can I find more information on RCU?
+
+ See the RTFP.txt file in this directory.
+ Or point your browser at (http://www.rdrop.com/users/paulmck/RCU/).
+++ /dev/null
-.. _rcu_doc:
-
-RCU Concepts
-============
-
-The basic idea behind RCU (read-copy update) is to split destructive
-operations into two parts, one that prevents anyone from seeing the data
-item being destroyed, and one that actually carries out the destruction.
-A "grace period" must elapse between the two parts, and this grace period
-must be long enough that any readers accessing the item being deleted have
-since dropped their references. For example, an RCU-protected deletion
-from a linked list would first remove the item from the list, wait for
-a grace period to elapse, then free the element. See the
-Documentation/RCU/listRCU.rst file for more information on using RCU with
-linked lists.
-
-Frequently Asked Questions
---------------------------
-
-- Why would anyone want to use RCU?
-
- The advantage of RCU's two-part approach is that RCU readers need
- not acquire any locks, perform any atomic instructions, write to
- shared memory, or (on CPUs other than Alpha) execute any memory
- barriers. The fact that these operations are quite expensive
- on modern CPUs is what gives RCU its performance advantages
- in read-mostly situations. The fact that RCU readers need not
- acquire locks can also greatly simplify deadlock-avoidance code.
-
-- How can the updater tell when a grace period has completed
- if the RCU readers give no indication when they are done?
-
- Just as with spinlocks, RCU readers are not permitted to
- block, switch to user-mode execution, or enter the idle loop.
- Therefore, as soon as a CPU is seen passing through any of these
- three states, we know that that CPU has exited any previous RCU
- read-side critical sections. So, if we remove an item from a
- linked list, and then wait until all CPUs have switched context,
- executed in user mode, or executed in the idle loop, we can
- safely free up that item.
-
- Preemptible variants of RCU (CONFIG_PREEMPT_RCU) get the
- same effect, but require that the readers manipulate CPU-local
- counters. These counters allow limited types of blocking within
- RCU read-side critical sections. SRCU also uses CPU-local
- counters, and permits general blocking within RCU read-side
- critical sections. These variants of RCU detect grace periods
- by sampling these counters.
-
-- If I am running on a uniprocessor kernel, which can only do one
- thing at a time, why should I wait for a grace period?
-
- See the Documentation/RCU/UP.rst file for more information.
-
-- How can I see where RCU is currently used in the Linux kernel?
-
- Search for "rcu_read_lock", "rcu_read_unlock", "call_rcu",
- "rcu_read_lock_bh", "rcu_read_unlock_bh", "srcu_read_lock",
- "srcu_read_unlock", "synchronize_rcu", "synchronize_net",
- "synchronize_srcu", and the other RCU primitives. Or grab one
- of the cscope databases from:
-
- (http://www.rdrop.com/users/paulmck/RCU/linuxusage/rculocktab.html).
-
-- What guidelines should I follow when writing code that uses RCU?
-
- See the checklist.txt file in this directory.
-
-- Why the name "RCU"?
-
- "RCU" stands for "read-copy update". The file Documentation/RCU/listRCU.rst
- has more information on where this name came from, search for
- "read-copy update" to find it.
-
-- I hear that RCU is patented? What is with that?
-
- Yes, it is. There are several known patents related to RCU,
- search for the string "Patent" in RTFP.txt to find them.
- Of these, one was allowed to lapse by the assignee, and the
- others have been contributed to the Linux kernel under GPL.
- There are now also LGPL implementations of user-level RCU
- available (http://liburcu.org/).
-
-- I hear that RCU needs work in order to support realtime kernels?
-
- Realtime-friendly RCU can be enabled via the CONFIG_PREEMPT_RCU
- kernel configuration parameter.
-
-- Where can I find more information on RCU?
-
- See the RTFP.txt file in this directory.
- Or point your browser at (http://www.rdrop.com/users/paulmck/RCU/).