The restrack code relies on the fact that object structures are zeroed at
the allocation stage, the mlx4 CQ wasn't allocated with kzalloc and it
caused to the following crash.
[ 137.392209] general protection fault: 0000 [#1] SMP KASAN PTI
[ 137.392972] CPU: 0 PID: 622 Comm: ibv_rc_pingpong Tainted: G W
4.16.0-rc1-00099-g00313983cda6 #11
[ 137.395079] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-2.fc27 04/01/2014
[ 137.396866] RIP: 0010:rdma_restrack_del+0xc8/0xf0
[ 137.397762] RSP: 0018:
ffff8801b54e7968 EFLAGS:
00010206
[ 137.399008] RAX:
0000000000000000 RBX:
ffff8801d8bcbae8 RCX:
ffffffffb82314df
[ 137.400055] RDX:
dffffc0000000000 RSI:
dffffc0000000000 RDI:
70696b533d454741
[ 137.401103] RBP:
ffff8801d90c07a0 R08:
ffff8801d8bcbb00 R09:
0000000000000000
[ 137.402470] R10:
0000000000000001 R11:
ffffed0036a9cf52 R12:
ffff8801d90c0ad0
[ 137.403318] R13:
ffff8801d853fb20 R14:
ffff8801d8bcbb28 R15:
0000000000000014
[ 137.404736] FS:
00007fb415d43740(0000) GS:
ffff8801e5c00000(0000) knlGS:
0000000000000000
[ 137.406074] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
[ 137.407101] CR2:
00007fb41557df20 CR3:
00000001b580c001 CR4:
00000000003606b0
[ 137.408308] DR0:
0000000000000000 DR1:
0000000000000000 DR2:
0000000000000000
[ 137.409352] DR3:
0000000000000000 DR6:
00000000fffe0ff0 DR7:
0000000000000400
[ 137.410385] Call Trace:
[ 137.411058] ib_destroy_cq+0x23/0x60
[ 137.411460] uverbs_free_cq+0x37/0xa0
[ 137.412040] remove_commit_idr_uobject+0x38/0xf0
[ 137.413042] _rdma_remove_commit_uobject+0x5c/0x160
[ 137.413782] ? lookup_get_idr_uobject+0x39/0x50
[ 137.414737] rdma_remove_commit_uobject+0x3b/0x70
[ 137.415742] ib_uverbs_destroy_cq+0x114/0x1d0
[ 137.416260] ? ib_uverbs_req_notify_cq+0x160/0x160
[ 137.417073] ? kernel_text_address+0x5c/0x90
[ 137.417805] ? __kernel_text_address+0xe/0x30
[ 137.418766] ? unwind_get_return_address+0x2f/0x50
[ 137.419558] ib_uverbs_write+0x453/0x6a0
[ 137.420220] ? show_ibdev+0x90/0x90
[ 137.420653] ? __kasan_slab_free+0x136/0x180
[ 137.421155] ? kmem_cache_free+0x78/0x1e0
[ 137.422192] ? remove_vma+0x83/0x90
[ 137.422614] ? do_munmap+0x447/0x6c0
[ 137.423045] ? vm_munmap+0xb0/0x100
[ 137.423481] ? SyS_munmap+0x1d/0x30
[ 137.424120] ? do_syscall_64+0xeb/0x250
[ 137.424984] ? entry_SYSCALL_64_after_hwframe+0x21/0x86
[ 137.425611] ? lru_add_drain_all+0x270/0x270
[ 137.426116] ? lru_add_drain_cpu+0xa3/0x170
[ 137.426616] ? lru_add_drain+0x11/0x20
[ 137.427058] ? free_pages_and_swap_cache+0xa6/0x120
[ 137.427672] ? tlb_flush_mmu_free+0x78/0x90
[ 137.428168] ? arch_tlb_finish_mmu+0x6d/0xb0
[ 137.428680] __vfs_write+0xc4/0x350
[ 137.430917] ? kernel_read+0xa0/0xa0
[ 137.432758] ? remove_vma+0x90/0x90
[ 137.434781] ? __kasan_slab_free+0x14b/0x180
[ 137.437486] ? remove_vma+0x83/0x90
[ 137.439836] ? kmem_cache_free+0x78/0x1e0
[ 137.442195] ? percpu_counter_add_batch+0x1d/0x90
[ 137.444389] vfs_write+0xf7/0x280
[ 137.446030] SyS_write+0xa1/0x120
[ 137.447867] ? SyS_read+0x120/0x120
[ 137.449670] ? mm_fault_error+0x180/0x180
[ 137.451539] ? _cond_resched+0x16/0x50
[ 137.453697] ? SyS_read+0x120/0x120
[ 137.455883] do_syscall_64+0xeb/0x250
[ 137.457686] entry_SYSCALL_64_after_hwframe+0x21/0x86
[ 137.459595] RIP: 0033:0x7fb415637b94
[ 137.461315] RSP: 002b:
00007ffdebea7d88 EFLAGS:
00000246 ORIG_RAX:
0000000000000001
[ 137.463879] RAX:
ffffffffffffffda RBX:
00005565022d1bd0 RCX:
00007fb415637b94
[ 137.466519] RDX:
0000000000000018 RSI:
00007ffdebea7da0 RDI:
0000000000000003
[ 137.469543] RBP:
00007ffdebea7d98 R08:
0000000000000000 R09:
00005565022d40c0
[ 137.472479] R10:
00000000000009cf R11:
0000000000000246 R12:
00005565022d2520
[ 137.475125] R13:
00000000000003e8 R14:
0000000000000000 R15:
00007ffdebea7fd0
[ 137.477760] Code: f7 e8 dd 0d 0b ff 48 c7 43 40 00 00 00 00 48 89 df e8 0d 0b 0b ff 48 8d 7b 28 c6 03 00 e8 41 0d 0b ff 48 8b 7b 28 48 85 ff 74 06 <f0> ff 4f 48 74 10 5b 48 89 ef 5d 41 5c 41 5d 41 5e e9 32 b0 ee
[ 137.483375] RIP: rdma_restrack_del+0xc8/0xf0 RSP:
ffff8801b54e7968
[ 137.486436] ---[ end trace
81835a1ea6722eed ]---
[ 137.488566] Kernel panic - not syncing: Fatal exception
[ 137.491162] Kernel Offset: 0x36000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
Fixes: 00313983cda6 ("RDMA/nldev: provide detailed CM_ID information")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>