pipe: fix poll/select race introduced by the pipe rework
authorLinus Torvalds <torvalds@linux-foundation.org>
Sat, 7 Dec 2019 18:41:17 +0000 (10:41 -0800)
committerLinus Torvalds <torvalds@linux-foundation.org>
Sat, 7 Dec 2019 18:41:17 +0000 (10:41 -0800)
The kernel wait queues have a basic rule to them: you add yourself to
the wait-queue first, and then you check the things that you're going to
wait on.  That avoids the races with the event you're waiting for.

The same goes for poll/select logic: the "poll_wait()" goes first, and
then you check the things you're polling for.

Of course, if you use locking, the ordering doesn't matter since the
lock will serialize with anything that changes the state you're looking
at. That's not the case here, though.

So move the poll_wait() first in pipe_poll(), before you start looking
at the pipe state.

Fixes: 8cefc107ca54 ("pipe: Use head and tail pointers for the ring, not cursor and length")
Cc: David Howells <dhowells@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fs/pipe.c

index b901c8eefafd72829f76580cc6f13bbe5ed347f6..3e8b11e3b764bc3ebe4ee71feb6328357436a3bc 100644 (file)
--- a/fs/pipe.c
+++ b/fs/pipe.c
@@ -574,12 +574,24 @@ pipe_poll(struct file *filp, poll_table *wait)
 {
        __poll_t mask;
        struct pipe_inode_info *pipe = filp->private_data;
-       unsigned int head = READ_ONCE(pipe->head);
-       unsigned int tail = READ_ONCE(pipe->tail);
+       unsigned int head, tail;
 
+       /*
+        * Reading only -- no need for acquiring the semaphore.
+        *
+        * But because this is racy, the code has to add the
+        * entry to the poll table _first_ ..
+        */
        poll_wait(filp, &pipe->wait, wait);
 
-       /* Reading only -- no need for acquiring the semaphore.  */
+       /*
+        * .. and only then can you do the racy tests. That way,
+        * if something changes and you got it wrong, the poll
+        * table entry will wake you up and fix it.
+        */
+       head = READ_ONCE(pipe->head);
+       tail = READ_ONCE(pipe->tail);
+
        mask = 0;
        if (filp->f_mode & FMODE_READ) {
                if (!pipe_empty(head, tail))