mirror of
https://github.com/openwrt/openwrt.git
synced 2025-01-15 01:10:29 +00:00
162 lines
6.3 KiB
Diff
162 lines
6.3 KiB
Diff
|
From f59c618ed70a1e48accc4cad91a200966f2569c9 Mon Sep 17 00:00:00 2001
|
||
|
From: Yu Zhao <yuzhao@google.com>
|
||
|
Date: Tue, 2 Feb 2021 01:27:45 -0700
|
||
|
Subject: [PATCH 10/10] mm: multigenerational lru: documentation
|
||
|
|
||
|
Add Documentation/vm/multigen_lru.rst.
|
||
|
|
||
|
Signed-off-by: Yu Zhao <yuzhao@google.com>
|
||
|
Tested-by: Konstantin Kharlamov <Hi-Angel@yandex.ru>
|
||
|
Change-Id: I1902178bcbb5adfa0a748c4d284a6456059bdd7e
|
||
|
---
|
||
|
Documentation/vm/index.rst | 1 +
|
||
|
Documentation/vm/multigen_lru.rst | 132 ++++++++++++++++++++++++++++++
|
||
|
2 files changed, 133 insertions(+)
|
||
|
create mode 100644 Documentation/vm/multigen_lru.rst
|
||
|
|
||
|
--- a/Documentation/vm/index.rst
|
||
|
+++ b/Documentation/vm/index.rst
|
||
|
@@ -17,6 +17,7 @@ various features of the Linux memory man
|
||
|
|
||
|
swap_numa
|
||
|
zswap
|
||
|
+ multigen_lru
|
||
|
|
||
|
Kernel developers MM documentation
|
||
|
==================================
|
||
|
--- /dev/null
|
||
|
+++ b/Documentation/vm/multigen_lru.rst
|
||
|
@@ -0,0 +1,132 @@
|
||
|
+.. SPDX-License-Identifier: GPL-2.0
|
||
|
+
|
||
|
+=====================
|
||
|
+Multigenerational LRU
|
||
|
+=====================
|
||
|
+
|
||
|
+Quick Start
|
||
|
+===========
|
||
|
+Build Configurations
|
||
|
+--------------------
|
||
|
+:Required: Set ``CONFIG_LRU_GEN=y``.
|
||
|
+
|
||
|
+:Optional: Set ``CONFIG_LRU_GEN_ENABLED=y`` to turn the feature on by
|
||
|
+ default.
|
||
|
+
|
||
|
+Runtime Configurations
|
||
|
+----------------------
|
||
|
+:Required: Write ``1`` to ``/sys/kernel/mm/lru_gen/enable`` if the
|
||
|
+ feature was not turned on by default.
|
||
|
+
|
||
|
+:Optional: Write ``N`` to ``/sys/kernel/mm/lru_gen/min_ttl_ms`` to
|
||
|
+ protect the working set of ``N`` milliseconds. The OOM killer is
|
||
|
+ invoked if this working set cannot be kept in memory.
|
||
|
+
|
||
|
+:Optional: Read ``/sys/kernel/debug/lru_gen`` to confirm the feature
|
||
|
+ is turned on. This file has the following output:
|
||
|
+
|
||
|
+::
|
||
|
+
|
||
|
+ memcg memcg_id memcg_path
|
||
|
+ node node_id
|
||
|
+ min_gen birth_time anon_size file_size
|
||
|
+ ...
|
||
|
+ max_gen birth_time anon_size file_size
|
||
|
+
|
||
|
+``min_gen`` is the oldest generation number and ``max_gen`` is the
|
||
|
+youngest generation number. ``birth_time`` is in milliseconds.
|
||
|
+``anon_size`` and ``file_size`` are in pages.
|
||
|
+
|
||
|
+Phones/Laptops/Workstations
|
||
|
+---------------------------
|
||
|
+No additional configurations required.
|
||
|
+
|
||
|
+Servers/Data Centers
|
||
|
+--------------------
|
||
|
+:To support more generations: Change ``CONFIG_NR_LRU_GENS`` to a
|
||
|
+ larger number.
|
||
|
+
|
||
|
+:To support more tiers: Change ``CONFIG_TIERS_PER_GEN`` to a larger
|
||
|
+ number.
|
||
|
+
|
||
|
+:To support full stats: Set ``CONFIG_LRU_GEN_STATS=y``.
|
||
|
+
|
||
|
+:Working set estimation: Write ``+ memcg_id node_id max_gen
|
||
|
+ [swappiness] [use_bloom_filter]`` to ``/sys/kernel/debug/lru_gen`` to
|
||
|
+ invoke the aging, which scans PTEs for accessed pages and then
|
||
|
+ creates the next generation ``max_gen+1``. A swap file and a non-zero
|
||
|
+ ``swappiness``, which overrides ``vm.swappiness``, are required to
|
||
|
+ scan PTEs mapping anon pages. Set ``use_bloom_filter`` to 0 to
|
||
|
+ override the default behavior which only scans PTE tables found
|
||
|
+ populated.
|
||
|
+
|
||
|
+:Proactive reclaim: Write ``- memcg_id node_id min_gen [swappiness]
|
||
|
+ [nr_to_reclaim]`` to ``/sys/kernel/debug/lru_gen`` to invoke the
|
||
|
+ eviction, which evicts generations less than or equal to ``min_gen``.
|
||
|
+ ``min_gen`` should be less than ``max_gen-1`` as ``max_gen`` and
|
||
|
+ ``max_gen-1`` are not fully aged and therefore cannot be evicted.
|
||
|
+ Use ``nr_to_reclaim`` to limit the number of pages to evict. Multiple
|
||
|
+ command lines are supported, so does concatenation with delimiters
|
||
|
+ ``,`` and ``;``.
|
||
|
+
|
||
|
+Framework
|
||
|
+=========
|
||
|
+For each ``lruvec``, evictable pages are divided into multiple
|
||
|
+generations. The youngest generation number is stored in
|
||
|
+``lrugen->max_seq`` for both anon and file types as they are aged on
|
||
|
+an equal footing. The oldest generation numbers are stored in
|
||
|
+``lrugen->min_seq[]`` separately for anon and file types as clean
|
||
|
+file pages can be evicted regardless of swap and writeback
|
||
|
+constraints. These three variables are monotonically increasing.
|
||
|
+Generation numbers are truncated into
|
||
|
+``order_base_2(CONFIG_NR_LRU_GENS+1)`` bits in order to fit into
|
||
|
+``page->flags``. The sliding window technique is used to prevent
|
||
|
+truncated generation numbers from overlapping. Each truncated
|
||
|
+generation number is an index to an array of per-type and per-zone
|
||
|
+lists ``lrugen->lists``.
|
||
|
+
|
||
|
+Each generation is divided into multiple tiers. Tiers represent
|
||
|
+different ranges of numbers of accesses from file descriptors only.
|
||
|
+Pages accessed ``N`` times via file descriptors belong to tier
|
||
|
+``order_base_2(N)``. Each generation contains at most
|
||
|
+``CONFIG_TIERS_PER_GEN`` tiers, and they require additional
|
||
|
+``CONFIG_TIERS_PER_GEN-2`` bits in ``page->flags``. In contrast to
|
||
|
+moving between generations which requires list operations, moving
|
||
|
+between tiers only involves operations on ``page->flags`` and
|
||
|
+therefore has a negligible cost. A feedback loop modeled after the PID
|
||
|
+controller monitors refaulted % across all tiers and decides when to
|
||
|
+protect pages from which tiers.
|
||
|
+
|
||
|
+The framework comprises two conceptually independent components: the
|
||
|
+aging and the eviction, which can be invoked separately from user
|
||
|
+space for the purpose of working set estimation and proactive reclaim.
|
||
|
+
|
||
|
+Aging
|
||
|
+-----
|
||
|
+The aging produces young generations. Given an ``lruvec``, the aging
|
||
|
+traverses ``lruvec_memcg()->mm_list`` and calls ``walk_page_range()``
|
||
|
+to scan PTEs for accessed pages (a ``mm_struct`` list is maintained
|
||
|
+for each ``memcg``). Upon finding one, the aging updates its
|
||
|
+generation number to ``max_seq`` (modulo ``CONFIG_NR_LRU_GENS``).
|
||
|
+After each round of traversal, the aging increments ``max_seq``. The
|
||
|
+aging is due when ``min_seq[]`` reaches ``max_seq-1``.
|
||
|
+
|
||
|
+Eviction
|
||
|
+--------
|
||
|
+The eviction consumes old generations. Given an ``lruvec``, the
|
||
|
+eviction scans pages on the per-zone lists indexed by anon and file
|
||
|
+``min_seq[]`` (modulo ``CONFIG_NR_LRU_GENS``). It first tries to
|
||
|
+select a type based on the values of ``min_seq[]``. If they are
|
||
|
+equal, it selects the type that has a lower refaulted %. The eviction
|
||
|
+sorts a page according to its updated generation number if the aging
|
||
|
+has found this page accessed. It also moves a page to the next
|
||
|
+generation if this page is from an upper tier that has a higher
|
||
|
+refaulted % than the base tier. The eviction increments ``min_seq[]``
|
||
|
+of a selected type when it finds all the per-zone lists indexed by
|
||
|
+``min_seq[]`` of this selected type are empty.
|
||
|
+
|
||
|
+To-do List
|
||
|
+==========
|
||
|
+KVM Optimization
|
||
|
+----------------
|
||
|
+Support shadow page table walk.
|