mirror of
https://github.com/openwrt/openwrt.git
synced 2025-01-17 10:20:01 +00:00
2a7bdde29b
Move MGLRU patches from pending to backport as they got merged upstream. These are direct porting from one of the dev so it's better to just move than trying to backport them again from upstream. Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
162 lines
6.3 KiB
Diff
162 lines
6.3 KiB
Diff
From f59c618ed70a1e48accc4cad91a200966f2569c9 Mon Sep 17 00:00:00 2001
|
|
From: Yu Zhao <yuzhao@google.com>
|
|
Date: Tue, 2 Feb 2021 01:27:45 -0700
|
|
Subject: [PATCH 10/10] mm: multigenerational lru: documentation
|
|
|
|
Add Documentation/vm/multigen_lru.rst.
|
|
|
|
Signed-off-by: Yu Zhao <yuzhao@google.com>
|
|
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
|
|
Change-Id: I1902178bcbb5adfa0a748c4d284a6456059bdd7e
|
|
---
|
|
Documentation/vm/index.rst | 1 +
|
|
Documentation/vm/multigen_lru.rst | 132 ++++++++++++++++++++++++++++++
|
|
2 files changed, 133 insertions(+)
|
|
create mode 100644 Documentation/vm/multigen_lru.rst
|
|
|
|
--- a/Documentation/vm/index.rst
|
|
+++ b/Documentation/vm/index.rst
|
|
@@ -17,6 +17,7 @@ various features of the Linux memory man
|
|
|
|
swap_numa
|
|
zswap
|
|
+ multigen_lru
|
|
|
|
Kernel developers MM documentation
|
|
==================================
|
|
--- /dev/null
|
|
+++ b/Documentation/vm/multigen_lru.rst
|
|
@@ -0,0 +1,132 @@
|
|
+.. SPDX-License-Identifier: GPL-2.0
|
|
+
|
|
+=====================
|
|
+Multigenerational LRU
|
|
+=====================
|
|
+
|
|
+Quick Start
|
|
+===========
|
|
+Build Configurations
|
|
+--------------------
|
|
+:Required: Set ``CONFIG_LRU_GEN=y``.
|
|
+
|
|
+:Optional: Set ``CONFIG_LRU_GEN_ENABLED=y`` to turn the feature on by
|
|
+ default.
|
|
+
|
|
+Runtime Configurations
|
|
+----------------------
|
|
+:Required: Write ``1`` to ``/sys/kernel/mm/lru_gen/enable`` if the
|
|
+ feature was not turned on by default.
|
|
+
|
|
+:Optional: Write ``N`` to ``/sys/kernel/mm/lru_gen/min_ttl_ms`` to
|
|
+ protect the working set of ``N`` milliseconds. The OOM killer is
|
|
+ invoked if this working set cannot be kept in memory.
|
|
+
|
|
+:Optional: Read ``/sys/kernel/debug/lru_gen`` to confirm the feature
|
|
+ is turned on. This file has the following output:
|
|
+
|
|
+::
|
|
+
|
|
+ memcg memcg_id memcg_path
|
|
+ node node_id
|
|
+ min_gen birth_time anon_size file_size
|
|
+ ...
|
|
+ max_gen birth_time anon_size file_size
|
|
+
|
|
+``min_gen`` is the oldest generation number and ``max_gen`` is the
|
|
+youngest generation number. ``birth_time`` is in milliseconds.
|
|
+``anon_size`` and ``file_size`` are in pages.
|
|
+
|
|
+Phones/Laptops/Workstations
|
|
+---------------------------
|
|
+No additional configurations required.
|
|
+
|
|
+Servers/Data Centers
|
|
+--------------------
|
|
+:To support more generations: Change ``CONFIG_NR_LRU_GENS`` to a
|
|
+ larger number.
|
|
+
|
|
+:To support more tiers: Change ``CONFIG_TIERS_PER_GEN`` to a larger
|
|
+ number.
|
|
+
|
|
+:To support full stats: Set ``CONFIG_LRU_GEN_STATS=y``.
|
|
+
|
|
+:Working set estimation: Write ``+ memcg_id node_id max_gen
|
|
+ [swappiness] [use_bloom_filter]`` to ``/sys/kernel/debug/lru_gen`` to
|
|
+ invoke the aging, which scans PTEs for accessed pages and then
|
|
+ creates the next generation ``max_gen+1``. A swap file and a non-zero
|
|
+ ``swappiness``, which overrides ``vm.swappiness``, are required to
|
|
+ scan PTEs mapping anon pages. Set ``use_bloom_filter`` to 0 to
|
|
+ override the default behavior which only scans PTE tables found
|
|
+ populated.
|
|
+
|
|
+:Proactive reclaim: Write ``- memcg_id node_id min_gen [swappiness]
|
|
+ [nr_to_reclaim]`` to ``/sys/kernel/debug/lru_gen`` to invoke the
|
|
+ eviction, which evicts generations less than or equal to ``min_gen``.
|
|
+ ``min_gen`` should be less than ``max_gen-1`` as ``max_gen`` and
|
|
+ ``max_gen-1`` are not fully aged and therefore cannot be evicted.
|
|
+ Use ``nr_to_reclaim`` to limit the number of pages to evict. Multiple
|
|
+ command lines are supported, so does concatenation with delimiters
|
|
+ ``,`` and ``;``.
|
|
+
|
|
+Framework
|
|
+=========
|
|
+For each ``lruvec``, evictable pages are divided into multiple
|
|
+generations. The youngest generation number is stored in
|
|
+``lrugen->max_seq`` for both anon and file types as they are aged on
|
|
+an equal footing. The oldest generation numbers are stored in
|
|
+``lrugen->min_seq[]`` separately for anon and file types as clean
|
|
+file pages can be evicted regardless of swap and writeback
|
|
+constraints. These three variables are monotonically increasing.
|
|
+Generation numbers are truncated into
|
|
+``order_base_2(CONFIG_NR_LRU_GENS+1)`` bits in order to fit into
|
|
+``page->flags``. The sliding window technique is used to prevent
|
|
+truncated generation numbers from overlapping. Each truncated
|
|
+generation number is an index to an array of per-type and per-zone
|
|
+lists ``lrugen->lists``.
|
|
+
|
|
+Each generation is divided into multiple tiers. Tiers represent
|
|
+different ranges of numbers of accesses from file descriptors only.
|
|
+Pages accessed ``N`` times via file descriptors belong to tier
|
|
+``order_base_2(N)``. Each generation contains at most
|
|
+``CONFIG_TIERS_PER_GEN`` tiers, and they require additional
|
|
+``CONFIG_TIERS_PER_GEN-2`` bits in ``page->flags``. In contrast to
|
|
+moving between generations which requires list operations, moving
|
|
+between tiers only involves operations on ``page->flags`` and
|
|
+therefore has a negligible cost. A feedback loop modeled after the PID
|
|
+controller monitors refaulted % across all tiers and decides when to
|
|
+protect pages from which tiers.
|
|
+
|
|
+The framework comprises two conceptually independent components: the
|
|
+aging and the eviction, which can be invoked separately from user
|
|
+space for the purpose of working set estimation and proactive reclaim.
|
|
+
|
|
+Aging
|
|
+-----
|
|
+The aging produces young generations. Given an ``lruvec``, the aging
|
|
+traverses ``lruvec_memcg()->mm_list`` and calls ``walk_page_range()``
|
|
+to scan PTEs for accessed pages (a ``mm_struct`` list is maintained
|
|
+for each ``memcg``). Upon finding one, the aging updates its
|
|
+generation number to ``max_seq`` (modulo ``CONFIG_NR_LRU_GENS``).
|
|
+After each round of traversal, the aging increments ``max_seq``. The
|
|
+aging is due when ``min_seq[]`` reaches ``max_seq-1``.
|
|
+
|
|
+Eviction
|
|
+--------
|
|
+The eviction consumes old generations. Given an ``lruvec``, the
|
|
+eviction scans pages on the per-zone lists indexed by anon and file
|
|
+``min_seq[]`` (modulo ``CONFIG_NR_LRU_GENS``). It first tries to
|
|
+select a type based on the values of ``min_seq[]``. If they are
|
|
+equal, it selects the type that has a lower refaulted %. The eviction
|
|
+sorts a page according to its updated generation number if the aging
|
|
+has found this page accessed. It also moves a page to the next
|
|
+generation if this page is from an upper tier that has a higher
|
|
+refaulted % than the base tier. The eviction increments ``min_seq[]``
|
|
+of a selected type when it finds all the per-zone lists indexed by
|
|
+``min_seq[]`` of this selected type are empty.
|
|
+
|
|
+To-do List
|
|
+==========
|
|
+KVM Optimization
|
|
+----------------
|
|
+Support shadow page table walk.
|