crash in md-raid1 and md-raid10 due to incorrect list manipulation
The commit
55ce74d4bfe1b9444436264c637f39a152d1e5ac (md/raid1: ensure
device failure recorded before write request returns) is causing crash in
the LVM2 testsuite test shell/lvchange-raid.sh. For me the crash is 100%
reproducible.
The reason for the crash is that the newly added code in raid1d moves the
list from conf->bio_end_io_list to tmp, then tests if tmp is non-empty and
then incorrectly pops the bio from conf->bio_end_io_list (which is empty
because the list was alrady moved).
Raid-10 has a similar bug.
Kernel Fault: Code=15 regs=
000000006ccb8640 (Addr=
0000000100000000)
CPU: 3 PID: 1930 Comm: mdX_raid1 Not tainted 4.2.0-rc5-bisect+ #35
task:
000000006cc1f258 ti:
000000006ccb8000 task.ti:
000000006ccb8000
YZrvWESTHLNXBCVMcbcbcbcbOGFRQPDI
PSW:
00001000000001001111111000001111 Not tainted
r00-03
000000ff0804fe0f 000000001059d000 000000001059f818 000000007f16be38
r04-07
000000001059d000 000000007f16be08 0000000000200200 0000000000000001
r08-11
000000006ccb8260 000000007b7934d0 0000000000000001 0000000000000000
r12-15
000000004056f320 0000000000000000 0000000000013dd0 0000000000000000
r16-19
00000000f0d00ae0 0000000000000000 0000000000000000 0000000000000001
r20-23
000000000800000f 0000000042200390 0000000000000000 0000000000000000
r24-27
0000000000000001 000000000800000f 000000007f16be08 000000001059d000
r28-31
0000000100000000 000000006ccb8560 000000006ccb8640 0000000000000000
sr00-03
0000000000249800 0000000000000000 0000000000000000 0000000000249800
sr04-07
0000000000000000 0000000000000000 0000000000000000 0000000000000000
IASQ:
0000000000000000 0000000000000000 IAOQ:
000000001059f61c 000000001059f620
IIR:
0f8010c6 ISR:
0000000000000000 IOR:
0000000100000000
CPU: 3 CR30:
000000006ccb8000 CR31:
0000000000000000
ORIG_R28:
000000001059d000
IAOQ[0]: call_bio_endio+0x34/0x1a8 [raid1]
IAOQ[1]: call_bio_endio+0x38/0x1a8 [raid1]
RP(r2): raid_end_bio_io+0x88/0x168 [raid1]
Backtrace:
[<
000000001059f818>] raid_end_bio_io+0x88/0x168 [raid1]
[<
00000000105a4f64>] raid1d+0x144/0x1640 [raid1]
[<
000000004017fd5c>] kthread+0x144/0x160
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Fixes: 55ce74d4bfe1 ("md/raid1: ensure device failure recorded before write request returns.")
Fixes: 95af587e95aa ("md/raid10: ensure device failure recorded before write request returns.")
Signed-off-by: NeilBrown <neilb@suse.com>