kernel: mtk_bmt: refactor to avoid deep recursion
A Linksys E8450 (mt7622) device running current master has recently
started crashing:
[ 0.562900] mtk-ecc
1100e000.ecc: probed
[ 0.570254] spi-nand spi2.0: Fidelix SPI NAND was found.
[ 0.575576] spi-nand spi2.0: 128 MiB, block size: 128 KiB, page size: 2048, OOB size: 64
[ 0.583780] mtk-snand
1100d000.spi: ECC strength: 4 bits per 512 bytes
[ 0.682930] Insufficient stack space to handle exception!
[ 0.682939] ESR: 0x0000000096000047 -- DABT (current EL)
[ 0.682946] FAR: 0xffffffc008c47fe0
[ 0.682948] Task stack: [0xffffffc008c48000..0xffffffc008c4c000]
[ 0.682951] IRQ stack: [0xffffffc008008000..0xffffffc00800c000]
[ 0.682954] Overflow stack: [0xffffff801feb00a0..0xffffff801feb10a0]
[ 0.682959] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G S 5.15.107 #0
[ 0.682966] Hardware name: Linksys E8450 (DT)
[ 0.682969] pstate:
800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 0.682975] pc : dequeue_entity+0x0/0x250
[ 0.682988] lr : dequeue_task_fair+0x98/0x290
[ 0.682992] sp :
ffffffc008c48030
[ 0.682994] x29:
ffffffc008c48030 x28:
0000000000000001 x27:
ffffff801feb6380
[ 0.683004] x26:
0000000000000001 x25:
ffffff801feb6300 x24:
ffffff8000068000
[ 0.683011] x23:
0000000000000001 x22:
0000000000000009 x21:
0000000000000000
[ 0.683017] x20:
ffffff801feb6380 x19:
ffffff8000068080 x18:
0000000017a740a6
[ 0.683024] x17:
ffffffc008bae748 x16:
ffffffc008bae6d8 x15:
ffffffffffffffff
[ 0.683031] x14:
ffffffffffffffff x13:
0000000000000000 x12:
0000000f00000101
[ 0.683038] x11:
0000000000000449 x10:
0000000000000127 x9 :
0000000000000000
[ 0.683044] x8 :
0000000000000125 x7 :
0000000000116da1 x6 :
0000000000116da1
[ 0.683051] x5 :
00000000001165a1 x4 :
ffffff801feb6e00 x3 :
0000000000000000
[ 0.683058] x2 :
0000000000000009 x1 :
ffffff8000068080 x0 :
ffffff801feb6380
[ 0.683066] Kernel panic - not syncing: kernel stack overflow
[ 0.683069] SMP: stopping secondary CPUs
[ 1.648361] SMP: failed to stop secondary CPUs 0-1
[ 1.648366] Kernel Offset: disabled
[ 1.648368] CPU features: 0x00003000,
00000802
[ 1.648372] Memory Limit: none
Several factors contributed to this issue:
1. The mtk_bmt driver recursively calls its scan_bmt() helper function
during device initialization, while looking for a valid block
mapping table (BMT).
2. Commit
fa4dc86e98 ("kernel: backport MEMREAD ioctl"):
- increased the size of some stack-allocated structures (like
struct mtd_oob_ops, used in bbt_nand_read(), which is indirectly
called from scan_bmt()),
- increased the stack size for some functions (for example,
spinand_mtd_read(), which is indirectly called from scan_bmt(),
now uses an extra stack-allocated struct mtd_ecc_stats).
3. OpenWrt currently compiles the kernel with the
-fno-optimize-sibling-calls flag, which prevents tail-call
optimization.
Collectively, all of these factors caused stack usage in the mtk_bmt
driver to grow excessively large, triggering stack overflows.
Recursion is not really necessary in scan_bmt() as it simply iterates
over flash memory blocks in reverse order, looking for a valid BMT.
Refactor the logic contained in the scan_bmt() and read_bmt() functions
in target/linux/generic/files/drivers/mtd/nand/mtk_bmt_v2.c so that deep
recursion is prevented (and therefore also any potential stack overflows
it may cause).
Link: https://lists.openwrt.org/pipermail/openwrt-devel/2023-April/040872.html
Signed-off-by: Michał Kępień <openwrt@kempniu.pl>