mirror of
https://github.com/AuxXxilium/linux_dsm_epyc7002.git
synced 2024-12-16 23:06:40 +07:00
docs/vm: idle_page_tracking.txt: convert to ReST format
Signed-off-by: Mike Rapoport <rppt@linux.vnet.ibm.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net>
This commit is contained in:
parent
b53ba58845
commit
e3f2025a57
@ -1,4 +1,11 @@
|
|||||||
MOTIVATION
|
.. _idle_page_tracking:
|
||||||
|
|
||||||
|
==================
|
||||||
|
Idle Page Tracking
|
||||||
|
==================
|
||||||
|
|
||||||
|
Motivation
|
||||||
|
==========
|
||||||
|
|
||||||
The idle page tracking feature allows to track which memory pages are being
|
The idle page tracking feature allows to track which memory pages are being
|
||||||
accessed by a workload and which are idle. This information can be useful for
|
accessed by a workload and which are idle. This information can be useful for
|
||||||
@ -8,10 +15,14 @@ or deciding where to place the workload within a compute cluster.
|
|||||||
|
|
||||||
It is enabled by CONFIG_IDLE_PAGE_TRACKING=y.
|
It is enabled by CONFIG_IDLE_PAGE_TRACKING=y.
|
||||||
|
|
||||||
USER API
|
.. _user_api:
|
||||||
|
|
||||||
The idle page tracking API is located at /sys/kernel/mm/page_idle. Currently,
|
User API
|
||||||
it consists of the only read-write file, /sys/kernel/mm/page_idle/bitmap.
|
========
|
||||||
|
|
||||||
|
The idle page tracking API is located at ``/sys/kernel/mm/page_idle``.
|
||||||
|
Currently, it consists of the only read-write file,
|
||||||
|
``/sys/kernel/mm/page_idle/bitmap``.
|
||||||
|
|
||||||
The file implements a bitmap where each bit corresponds to a memory page. The
|
The file implements a bitmap where each bit corresponds to a memory page. The
|
||||||
bitmap is represented by an array of 8-byte integers, and the page at PFN #i is
|
bitmap is represented by an array of 8-byte integers, and the page at PFN #i is
|
||||||
@ -19,8 +30,9 @@ mapped to bit #i%64 of array element #i/64, byte order is native. When a bit is
|
|||||||
set, the corresponding page is idle.
|
set, the corresponding page is idle.
|
||||||
|
|
||||||
A page is considered idle if it has not been accessed since it was marked idle
|
A page is considered idle if it has not been accessed since it was marked idle
|
||||||
(for more details on what "accessed" actually means see the IMPLEMENTATION
|
(for more details on what "accessed" actually means see the :ref:`Implementation
|
||||||
DETAILS section). To mark a page idle one has to set the bit corresponding to
|
Details <impl_details>` section).
|
||||||
|
To mark a page idle one has to set the bit corresponding to
|
||||||
the page by writing to the file. A value written to the file is OR-ed with the
|
the page by writing to the file. A value written to the file is OR-ed with the
|
||||||
current bitmap value.
|
current bitmap value.
|
||||||
|
|
||||||
@ -30,9 +42,9 @@ page types (e.g. SLAB pages) an attempt to mark a page idle is silently ignored,
|
|||||||
and hence such pages are never reported idle.
|
and hence such pages are never reported idle.
|
||||||
|
|
||||||
For huge pages the idle flag is set only on the head page, so one has to read
|
For huge pages the idle flag is set only on the head page, so one has to read
|
||||||
/proc/kpageflags in order to correctly count idle huge pages.
|
``/proc/kpageflags`` in order to correctly count idle huge pages.
|
||||||
|
|
||||||
Reading from or writing to /sys/kernel/mm/page_idle/bitmap will return
|
Reading from or writing to ``/sys/kernel/mm/page_idle/bitmap`` will return
|
||||||
-EINVAL if you are not starting the read/write on an 8-byte boundary, or
|
-EINVAL if you are not starting the read/write on an 8-byte boundary, or
|
||||||
if the size of the read/write is not a multiple of 8 bytes. Writing to
|
if the size of the read/write is not a multiple of 8 bytes. Writing to
|
||||||
this file beyond max PFN will return -ENXIO.
|
this file beyond max PFN will return -ENXIO.
|
||||||
@ -41,21 +53,25 @@ That said, in order to estimate the amount of pages that are not used by a
|
|||||||
workload one should:
|
workload one should:
|
||||||
|
|
||||||
1. Mark all the workload's pages as idle by setting corresponding bits in
|
1. Mark all the workload's pages as idle by setting corresponding bits in
|
||||||
/sys/kernel/mm/page_idle/bitmap. The pages can be found by reading
|
``/sys/kernel/mm/page_idle/bitmap``. The pages can be found by reading
|
||||||
/proc/pid/pagemap if the workload is represented by a process, or by
|
``/proc/pid/pagemap`` if the workload is represented by a process, or by
|
||||||
filtering out alien pages using /proc/kpagecgroup in case the workload is
|
filtering out alien pages using ``/proc/kpagecgroup`` in case the workload
|
||||||
placed in a memory cgroup.
|
is placed in a memory cgroup.
|
||||||
|
|
||||||
2. Wait until the workload accesses its working set.
|
2. Wait until the workload accesses its working set.
|
||||||
|
|
||||||
3. Read /sys/kernel/mm/page_idle/bitmap and count the number of bits set. If
|
3. Read ``/sys/kernel/mm/page_idle/bitmap`` and count the number of bits set.
|
||||||
one wants to ignore certain types of pages, e.g. mlocked pages since they
|
If one wants to ignore certain types of pages, e.g. mlocked pages since they
|
||||||
are not reclaimable, he or she can filter them out using /proc/kpageflags.
|
are not reclaimable, he or she can filter them out using
|
||||||
|
``/proc/kpageflags``.
|
||||||
|
|
||||||
See Documentation/vm/pagemap.txt for more information about /proc/pid/pagemap,
|
See Documentation/vm/pagemap.txt for more information about
|
||||||
/proc/kpageflags, and /proc/kpagecgroup.
|
``/proc/pid/pagemap``, ``/proc/kpageflags``, and ``/proc/kpagecgroup``.
|
||||||
|
|
||||||
IMPLEMENTATION DETAILS
|
.. _impl_details:
|
||||||
|
|
||||||
|
Implementation Details
|
||||||
|
======================
|
||||||
|
|
||||||
The kernel internally keeps track of accesses to user memory pages in order to
|
The kernel internally keeps track of accesses to user memory pages in order to
|
||||||
reclaim unreferenced pages first on memory shortage conditions. A page is
|
reclaim unreferenced pages first on memory shortage conditions. A page is
|
||||||
@ -77,7 +93,8 @@ When a dirty page is written to swap or disk as a result of memory reclaim or
|
|||||||
exceeding the dirty memory limit, it is not marked referenced.
|
exceeding the dirty memory limit, it is not marked referenced.
|
||||||
|
|
||||||
The idle memory tracking feature adds a new page flag, the Idle flag. This flag
|
The idle memory tracking feature adds a new page flag, the Idle flag. This flag
|
||||||
is set manually, by writing to /sys/kernel/mm/page_idle/bitmap (see the USER API
|
is set manually, by writing to ``/sys/kernel/mm/page_idle/bitmap`` (see the
|
||||||
|
:ref:`User API <user_api>`
|
||||||
section), and cleared automatically whenever a page is referenced as defined
|
section), and cleared automatically whenever a page is referenced as defined
|
||||||
above.
|
above.
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user