17 Commits

Author SHA1 Message Date
Michał Kępień
626c84340d kernel: mtk_bmt: refactor to avoid deep recursion
A Linksys E8450 (mt7622) device running current master has recently
started crashing:

    [    0.562900] mtk-ecc 1100e000.ecc: probed
    [    0.570254] spi-nand spi2.0: Fidelix SPI NAND was found.
    [    0.575576] spi-nand spi2.0: 128 MiB, block size: 128 KiB, page size: 2048, OOB size: 64
    [    0.583780] mtk-snand 1100d000.spi: ECC strength: 4 bits per 512 bytes
    [    0.682930] Insufficient stack space to handle exception!
    [    0.682939] ESR: 0x0000000096000047 -- DABT (current EL)
    [    0.682946] FAR: 0xffffffc008c47fe0
    [    0.682948] Task stack:     [0xffffffc008c48000..0xffffffc008c4c000]
    [    0.682951] IRQ stack:      [0xffffffc008008000..0xffffffc00800c000]
    [    0.682954] Overflow stack: [0xffffff801feb00a0..0xffffff801feb10a0]
    [    0.682959] CPU: 1 PID: 1 Comm: swapper/0 Tainted: G S                5.15.107 #0
    [    0.682966] Hardware name: Linksys E8450 (DT)
    [    0.682969] pstate: 800000c5 (Nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
    [    0.682975] pc : dequeue_entity+0x0/0x250
    [    0.682988] lr : dequeue_task_fair+0x98/0x290
    [    0.682992] sp : ffffffc008c48030
    [    0.682994] x29: ffffffc008c48030 x28: 0000000000000001 x27: ffffff801feb6380
    [    0.683004] x26: 0000000000000001 x25: ffffff801feb6300 x24: ffffff8000068000
    [    0.683011] x23: 0000000000000001 x22: 0000000000000009 x21: 0000000000000000
    [    0.683017] x20: ffffff801feb6380 x19: ffffff8000068080 x18: 0000000017a740a6
    [    0.683024] x17: ffffffc008bae748 x16: ffffffc008bae6d8 x15: ffffffffffffffff
    [    0.683031] x14: ffffffffffffffff x13: 0000000000000000 x12: 0000000f00000101
    [    0.683038] x11: 0000000000000449 x10: 0000000000000127 x9 : 0000000000000000
    [    0.683044] x8 : 0000000000000125 x7 : 0000000000116da1 x6 : 0000000000116da1
    [    0.683051] x5 : 00000000001165a1 x4 : ffffff801feb6e00 x3 : 0000000000000000
    [    0.683058] x2 : 0000000000000009 x1 : ffffff8000068080 x0 : ffffff801feb6380
    [    0.683066] Kernel panic - not syncing: kernel stack overflow
    [    0.683069] SMP: stopping secondary CPUs
    [    1.648361] SMP: failed to stop secondary CPUs 0-1
    [    1.648366] Kernel Offset: disabled
    [    1.648368] CPU features: 0x00003000,00000802
    [    1.648372] Memory Limit: none

Several factors contributed to this issue:

 1. The mtk_bmt driver recursively calls its scan_bmt() helper function
    during device initialization, while looking for a valid block
    mapping table (BMT).

 2. Commit fa4dc86e98 ("kernel: backport MEMREAD ioctl"):

      - increased the size of some stack-allocated structures (like
	struct mtd_oob_ops, used in bbt_nand_read(), which is indirectly
	called from scan_bmt()),

      - increased the stack size for some functions (for example,
	spinand_mtd_read(), which is indirectly called from scan_bmt(),
	now uses an extra stack-allocated struct mtd_ecc_stats).

 3. OpenWrt currently compiles the kernel with the
    -fno-optimize-sibling-calls flag, which prevents tail-call
    optimization.

Collectively, all of these factors caused stack usage in the mtk_bmt
driver to grow excessively large, triggering stack overflows.

Recursion is not really necessary in scan_bmt() as it simply iterates
over flash memory blocks in reverse order, looking for a valid BMT.
Refactor the logic contained in the scan_bmt() and read_bmt() functions
in target/linux/generic/files/drivers/mtd/nand/mtk_bmt_v2.c so that deep
recursion is prevented (and therefore also any potential stack overflows
it may cause).

Link: https://lists.openwrt.org/pipermail/openwrt-devel/2023-April/040872.html
Signed-off-by: Michał Kępień <openwrt@kempniu.pl>
2023-04-29 12:58:48 +02:00
Chuanhong Guo
f183ce35b8
kernel: mtk-bmt: fix usage of _oob_read
_oob_read returns number of bitflips on success while
bbt_nand_read should return 0.

Fixes: 2d49e49b18 ("mediatek: bmt: use generic mtd api")
Signed-off-by: Chuanhong Guo <gch981213@gmail.com>
2023-01-21 10:54:23 +08:00
Chuanhong Guo
6fa50e26e7 kernel: mtk_bmt: skip bitflip check if threshold isn't set
kernel spi-nand driver leaves this field empty and let mtd set it later.

Signed-off-by: Chuanhong Guo <gch981213@gmail.com>
2022-04-09 21:08:26 +08:00
Felix Fietkau
2a8a333ee9 kernel: mtk_bmt: add debugfs file to attempt repair of remapped sectors
This can be used for sectors that are not physically damaged

Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-03-25 21:11:09 +01:00
Felix Fietkau
06382d1af7 kernel: add support for mediatek NMBM flash mapping support
This NAND flash remapping method is used on newer MediaTek devices with NAND
flash.

Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-03-25 21:11:09 +01:00
Felix Fietkau
be1f2b4d9d kernel: mtk_bmt: on error, do not attempt to remap out-of-range blocks
Pass errors to caller instead

Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-03-25 21:11:09 +01:00
Felix Fietkau
7d1e2be160 kernel: mtk_bmt: fix block copying on remap with bmt v2
Copy from the previously mapped block (in case it was remapped already)

Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-03-25 21:11:09 +01:00
Felix Fietkau
b4c7f8c5f7 kernel: mtk_bmt: allow get_mapping_block to return an error
Used by the mapping implementation to indicate that no backing block is
available

Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-03-25 21:11:09 +01:00
Felix Fietkau
601c7b4adb kernel: split up mtk_bmt driver code
Keep a separate source file per variant

Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-03-25 21:11:09 +01:00
Felix Fietkau
e2aa04d1e5 kernel: mtk_bmt: add support for the bbt format used on mt7621
This does not have spare blocks for remapping, and it is also not suitable
for random write access. It only skips over bad blocks on linear writes of an
image to a partition. As such, it is really only suitable for the kernel
partition, or other partitions with mostly static data

Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-01-13 18:33:06 +01:00
Felix Fietkau
ff6b4f2cfb kernel: mtk_bmt: add abstraction for supporting other formats
Preparation for supporting remapping on MT7621

Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-01-13 18:33:06 +01:00
Felix Fietkau
7235c8d00c kernel: mtk_bmt: remap blocks after reaching bitflip threshold
This ensures that blocks are remapped before data becomes corrupt

Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-01-13 18:33:06 +01:00
Felix Fietkau
0ddead0897 kernel: mtk_bmt: pass number of bitflips on read to the caller
Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-01-13 18:33:06 +01:00
Felix Fietkau
b86452f841 kernel: mtk_bmt: add support for limiting range of remapping
This can be used to support ubi on top of mtk_bmt without reflashing the
boot loader. The boot loader + factory + kernel area is covered, while the
rest is passed through as-is

Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-01-13 18:33:06 +01:00
Felix Fietkau
31b6cfb288 kernel: mtk_bmt: extend debug interface
Add support for showing remapped blocks and garbage collecting old
remapped blocks triggered by using the mark_good/mark_bad files

Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-01-13 18:33:06 +01:00
Felix Fietkau
fde2421597 kernel: mtk_bmt: fix remapping after read/write failure
Copy from the previous block in order to preserve existing data

Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-01-13 18:33:06 +01:00
Felix Fietkau
bcf91fe884 kernel: move mediatek BMT support patch to generic patches
Preparation for supporting BMT on MT7621. Move source files to the files/
subdirectory in order to simplify maintenance

Signed-off-by: Felix Fietkau <nbd@nbd.name>
2022-01-13 18:33:06 +01:00