Michał Kępień
626c84340d
kernel: mtk_bmt: refactor to avoid deep recursion
...
A Linksys E8450 (mt7622) device running current master has recently
started crashing:
[ 0.562900] mtk-ecc 1100e000.ecc: probed
[ 0.570254] spi-nand spi2.0: Fidelix SPI NAND was found.
[ 0.575576] spi-nand spi2.0: 128 MiB, block size: 128 KiB, page size: 2048, OOB size: 64
[ 0.583780] mtk-snand 1100d000.spi: ECC strength: 4 bits per 512 bytes
[ 0.682930] Insufficient stack space to handle exception!
[ 0.682939] ESR: 0x0000000096000047 -- DABT (current EL)
[ 0.682946] FAR: 0xffffffc008c47fe0
[ 0.682948] Task stack: [0xffffffc008c48000..0xffffffc008c4c000]
[ 0.682951] IRQ stack: [0xffffffc008008000..0xffffffc00800c000]
[ 0.682954] Overflow stack: [0xffffff801feb00a0..0xffffff801feb10a0]
[ 0.682959] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G S 5.15.107 #0
[ 0.682966] Hardware name: Linksys E8450 (DT)
[ 0.682969] pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 0.682975] pc : dequeue_entity+0x0/0x250
[ 0.682988] lr : dequeue_task_fair+0x98/0x290
[ 0.682992] sp : ffffffc008c48030
[ 0.682994] x29: ffffffc008c48030 x28: 0000000000000001 x27: ffffff801feb6380
[ 0.683004] x26: 0000000000000001 x25: ffffff801feb6300 x24: ffffff8000068000
[ 0.683011] x23: 0000000000000001 x22: 0000000000000009 x21: 0000000000000000
[ 0.683017] x20: ffffff801feb6380 x19: ffffff8000068080 x18: 0000000017a740a6
[ 0.683024] x17: ffffffc008bae748 x16: ffffffc008bae6d8 x15: ffffffffffffffff
[ 0.683031] x14: ffffffffffffffff x13: 0000000000000000 x12: 0000000f00000101
[ 0.683038] x11: 0000000000000449 x10: 0000000000000127 x9 : 0000000000000000
[ 0.683044] x8 : 0000000000000125 x7 : 0000000000116da1 x6 : 0000000000116da1
[ 0.683051] x5 : 00000000001165a1 x4 : ffffff801feb6e00 x3 : 0000000000000000
[ 0.683058] x2 : 0000000000000009 x1 : ffffff8000068080 x0 : ffffff801feb6380
[ 0.683066] Kernel panic - not syncing: kernel stack overflow
[ 0.683069] SMP: stopping secondary CPUs
[ 1.648361] SMP: failed to stop secondary CPUs 0-1
[ 1.648366] Kernel Offset: disabled
[ 1.648368] CPU features: 0x00003000,00000802
[ 1.648372] Memory Limit: none
Several factors contributed to this issue:
1. The mtk_bmt driver recursively calls its scan_bmt() helper function
during device initialization, while looking for a valid block
mapping table (BMT).
2. Commit fa4dc86e98
("kernel: backport MEMREAD ioctl"):
- increased the size of some stack-allocated structures (like
struct mtd_oob_ops, used in bbt_nand_read(), which is indirectly
called from scan_bmt()),
- increased the stack size for some functions (for example,
spinand_mtd_read(), which is indirectly called from scan_bmt(),
now uses an extra stack-allocated struct mtd_ecc_stats).
3. OpenWrt currently compiles the kernel with the
-fno-optimize-sibling-calls flag, which prevents tail-call
optimization.
Collectively, all of these factors caused stack usage in the mtk_bmt
driver to grow excessively large, triggering stack overflows.
Recursion is not really necessary in scan_bmt() as it simply iterates
over flash memory blocks in reverse order, looking for a valid BMT.
Refactor the logic contained in the scan_bmt() and read_bmt() functions
in target/linux/generic/files/drivers/mtd/nand/mtk_bmt_v2.c so that deep
recursion is prevented (and therefore also any potential stack overflows
it may cause).
Link: https://lists.openwrt.org/pipermail/openwrt-devel/2023-April/040872.html
Signed-off-by: Michał Kępień <openwrt@kempniu.pl>
2023-04-29 12:58:48 +02:00
Chuanhong Guo
f183ce35b8
kernel: mtk-bmt: fix usage of _oob_read
...
_oob_read returns number of bitflips on success while
bbt_nand_read should return 0.
Fixes: 2d49e49b18
("mediatek: bmt: use generic mtd api")
Signed-off-by: Chuanhong Guo <gch981213@gmail.com>
2023-01-21 10:54:23 +08:00
Chuanhong Guo
6fa50e26e7
kernel: mtk_bmt: skip bitflip check if threshold isn't set
...
kernel spi-nand driver leaves this field empty and let mtd set it later.
Signed-off-by: Chuanhong Guo <gch981213@gmail.com>
2022-04-09 21:08:26 +08:00
Felix Fietkau
2a8a333ee9
kernel: mtk_bmt: add debugfs file to attempt repair of remapped sectors
...
This can be used for sectors that are not physically damaged
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-03-25 21:11:09 +01:00
Felix Fietkau
06382d1af7
kernel: add support for mediatek NMBM flash mapping support
...
This NAND flash remapping method is used on newer MediaTek devices with NAND
flash.
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-03-25 21:11:09 +01:00
Felix Fietkau
be1f2b4d9d
kernel: mtk_bmt: on error, do not attempt to remap out-of-range blocks
...
Pass errors to caller instead
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-03-25 21:11:09 +01:00
Felix Fietkau
7d1e2be160
kernel: mtk_bmt: fix block copying on remap with bmt v2
...
Copy from the previously mapped block (in case it was remapped already)
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-03-25 21:11:09 +01:00
Felix Fietkau
b4c7f8c5f7
kernel: mtk_bmt: allow get_mapping_block to return an error
...
Used by the mapping implementation to indicate that no backing block is
available
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-03-25 21:11:09 +01:00
Felix Fietkau
601c7b4adb
kernel: split up mtk_bmt driver code
...
Keep a separate source file per variant
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-03-25 21:11:09 +01:00
Felix Fietkau
e2aa04d1e5
kernel: mtk_bmt: add support for the bbt format used on mt7621
...
This does not have spare blocks for remapping, and it is also not suitable
for random write access. It only skips over bad blocks on linear writes of an
image to a partition. As such, it is really only suitable for the kernel
partition, or other partitions with mostly static data
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-01-13 18:33:06 +01:00
Felix Fietkau
ff6b4f2cfb
kernel: mtk_bmt: add abstraction for supporting other formats
...
Preparation for supporting remapping on MT7621
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-01-13 18:33:06 +01:00
Felix Fietkau
7235c8d00c
kernel: mtk_bmt: remap blocks after reaching bitflip threshold
...
This ensures that blocks are remapped before data becomes corrupt
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-01-13 18:33:06 +01:00
Felix Fietkau
0ddead0897
kernel: mtk_bmt: pass number of bitflips on read to the caller
...
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-01-13 18:33:06 +01:00
Felix Fietkau
b86452f841
kernel: mtk_bmt: add support for limiting range of remapping
...
This can be used to support ubi on top of mtk_bmt without reflashing the
boot loader. The boot loader + factory + kernel area is covered, while the
rest is passed through as-is
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-01-13 18:33:06 +01:00
Felix Fietkau
31b6cfb288
kernel: mtk_bmt: extend debug interface
...
Add support for showing remapped blocks and garbage collecting old
remapped blocks triggered by using the mark_good/mark_bad files
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-01-13 18:33:06 +01:00
Felix Fietkau
fde2421597
kernel: mtk_bmt: fix remapping after read/write failure
...
Copy from the previous block in order to preserve existing data
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-01-13 18:33:06 +01:00
Felix Fietkau
bcf91fe884
kernel: move mediatek BMT support patch to generic patches
...
Preparation for supporting BMT on MT7621. Move source files to the files/
subdirectory in order to simplify maintenance
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-01-13 18:33:06 +01:00