[PATCH 1/1] process_madvise.2: Add process_madvise man page
Michael Kerrisk (man-pages)
mtk.manpages at gmail.com
Thu Jan 28 12:24:12 UTC 2021
Hello Suren,
Thank you for writing this page! Some comments below.
On Wed, 20 Jan 2021 at 21:36, Suren Baghdasaryan <surenb at google.com> wrote:
>
> Initial version of process_madvise(2) manual page. Initial text was
> extracted from [1], amended after fix [2] and more details added using
> man pages of madvise(2) and process_vm_read(2) as examples. It also
> includes the changes to required permission proposed in [3].
>
> [1] https://lore.kernel.org/patchwork/patch/1297933/
> [2] https://lkml.org/lkml/2020/12/8/1282
> [3] https://patchwork.kernel.org/project/selinux/patch/20210111170622.2613577-1-surenb@google.com/#23888311
>
> Signed-off-by: Suren Baghdasaryan <surenb at google.com>
> Signed-off-by: Minchan Kim <minchan at kernel.org>
> ---
>
> Adding the plane text version for ease of review:
Thanks for adding the rendered version. I will make my comments
against the source, below.
> NAME
> process_madvise - give advice about use of memory to a process
>
> SYNOPSIS
> #include <sys/uio.h>
>
> ssize_t process_madvise(int pidfd,
> const struct iovec *iovec,
> unsigned long vlen,
> int advice,
> unsigned int flags);
>
> DESCRIPTION
> The process_madvise() system call is used to give advice or directions to
> the kernel about the address ranges from external process as well as local
> process. It provides the advice to address ranges of process described by
> iovec and vlen. The goal of such advice is to improve system or application
> performance.
>
> The pidfd selects the process referred to by the PID file descriptor
> specified in pidfd. (see pidofd_open(2) for further information).
>
> The pointer iovec points to an array of iovec structures, defined in
> <sys/uio.h> as:
>
> struct iovec {
> void *iov_base; /* Starting address */
> size_t iov_len; /* Number of bytes to transfer */
> };
>
> The iovec describes address ranges beginning at iov_base address and with
> the size of iov_len bytes.
>
> The vlen represents the number of elements in iovec.
>
> The advice can be one of the values listed below.
>
> Linux-specific advice values
> The following Linux-specific advice values have no counterparts in the
> POSIX-specified posix_madvise(3), and may or may not have counterparts in
> the madvise() interface available on other implementations.
>
> MADV_COLD (since Linux 5.4.1)
> Deactivate a given range of pages by moving them from active to
> inactive LRU list. This is done to accelerate the reclaim of these
> pages. The advice might be ignored for some pages in the range when it
> is not applicable.
> MADV_PAGEOUT (since Linux 5.4.1)
> Reclaim a given range of pages. This is done to free up memory occupied
> by these pages. If a page is anonymous it will be swapped out. If a
> page is file-backed and dirty it will be written back into the backing
> storage. The advice might be ignored for some pages in the range when
> it is not applicable.
>
> The flags argument is reserved for future use; currently, this argument must
> be specified as 0.
>
> The value specified in the vlen argument must be less than or equal to
> IOV_MAX (defined in <limits.h> or accessible via the call
> sysconf(_SC_IOV_MAX)).
>
> The vlen and iovec arguments are checked before applying any hints. If the
> vlen is too big, or iovec is invalid, an error will be returned
> immediately.
>
> Hint might be applied to a part of iovec if one of its elements points to
> an invalid memory region in the remote process. No further elements will be
> processed beyond that point.
>
> Permission to provide a hint to external process is governed by a ptrace
> access mode PTRACE_MODE_READ_REALCREDS check; see ptrace(2) and
> CAP_SYS_ADMIN capability that caller should have in order to affect
> performance of an external process.
>
> RETURN VALUE
> On success, process_madvise() returns the number of bytes advised. This
> return value may be less than the total number of requested bytes, if an
> error occurred. The caller should check return value to determine whether
> a partial advice occurred.
So there are three return values possible,
> ERRORS
> EFAULT The memory described by iovec is outside the accessible address
> space of the process pid.
s/pid/
of the process referred to by
.IR pidfd .
> EINVAL flags is not 0.
> EINVAL The sum of the iov_len values of iovec overflows a ssize_t value.
> EINVAL vlen is too large.
> ENOMEM Could not allocate memory for internal copies of the iovec
> structures.
> EPERM The caller does not have permission to access the address space of
> the process pidfd.
> ESRCH No process with ID pidfd exists.
>
> VERSIONS
> Since Linux 5.10, support for this system call is optional, depending on
> the setting of the CONFIG_ADVISE_SYSCALLS configuration option.
>
> SEE ALSO
> madvise(2), pidofd_open(2), process_vm_readv(2), process_vm_write(2)
>
> man2/process_madvise.2 | 208 +++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 208 insertions(+)
> create mode 100644 man2/process_madvise.2
>
> diff --git a/man2/process_madvise.2 b/man2/process_madvise.2
> new file mode 100644
> index 000000000..9bb5cb5ed
> --- /dev/null
> +++ b/man2/process_madvise.2
> @@ -0,0 +1,208 @@
> +.\" Copyright (C) 2021 Suren Baghdasaryan <surenb at google.com>
> +.\" and Copyright (C) 2021 Minchan Kim <minchan at kernel.org>
> +.\"
> +.\" %%%LICENSE_START(VERBATIM)
> +.\" Permission is granted to make and distribute verbatim copies of this
> +.\" manual provided the copyright notice and this permission notice are
> +.\" preserved on all copies.
> +.\"
> +.\" Permission is granted to copy and distribute modified versions of this
> +.\" manual under the conditions for verbatim copying, provided that the
> +.\" entire resulting derived work is distributed under the terms of a
> +.\" permission notice identical to this one.
> +.\"
> +.\" Since the Linux kernel and libraries are constantly changing, this
> +.\" manual page may be incorrect or out-of-date. The author(s) assume no
> +.\" responsibility for errors or omissions, or for damages resulting from
> +.\" the use of the information contained herein. The author(s) may not
> +.\" have taken the same level of care in the production of this manual,
> +.\" which is licensed free of charge, as they might when working
> +.\" professionally.
> +.\"
> +.\" Formatted or processed versions of this manual, if unaccompanied by
> +.\" the source, must acknowledge the copyright and authors of this work.
> +.\" %%%LICENSE_END
> +.\"
> +.\" Commit ecb8ac8b1f146915aa6b96449b66dd48984caacc
> +.\"
> +.TH PROCESS_MADVISE 2 2021-01-12 "Linux" "Linux Programmer's Manual"
> +.SH NAME
> +process_madvise \- give advice about use of memory to a process
> +.SH SYNOPSIS
> +.nf
> +.B #include <sys/uio.h>
> +.PP
> +.BI "ssize_t process_madvise(int " pidfd ,
> +.BI " const struct iovec *" iovec ,
> +.BI " unsigned long " vlen ,
> +.BI " int " advice ,
> +.BI " unsigned int " flags ");"
> +.fi
> +.SH DESCRIPTION
> +The
> +.BR process_madvise()
> +system call is used to give advice or directions
> +to the kernel about the address ranges from external process as well as
s/from external/of other/
> +local process. It provides the advice to address ranges of process
s/local/of the calling/
Please start new sentence on new lines. (See the discussion of
semantic newlines in man-pages(7).)
> +described by
> +.I iovec
> +and
> +.I vlen\.
> +The goal of such advice is to improve system or application performance.
> +.PP
> +The
> +.I pidfd
> +selects the process referred to by the PID file descriptor
> +specified in pidfd. (see
> +.BR pidofd_open(2)
> +for further information).
Rewrite the previous as:
[[
The
.I pidfd
argument is a PID file descriptor (see
.BR pidofd_open (2))
that specifies the process to which the advice is to be applied.
> +.PP
> +The pointer
> +.I iovec
> +points to an array of iovec structures, defined in
"iovec" should be formatted as
.I iovec
> +.IR <sys/uio.h>
> +as:
> +.PP
> +.in +4n
> +.EX
> +struct iovec {
> + void *iov_base; /* Starting address */
> + size_t iov_len; /* Number of bytes to transfer */
> +};
> +.EE
> +.in
> +.PP
> +The
> +.I iovec
> +describes address ranges beginning at
s/describes/structure describes/
> +.I iov_base
> +address and with the size of
> +.I iov_len
> +bytes.
> +.PP
> +The
> +.I vlen
> +represents the number of elements in
> +.I iovec\.
==>
the
.IR iovec
structure.
> +.PP
> +The
> +.I advice
> +can be one of the values listed below.
s/can be/argument is/
> +.\"
> +.\" ======================================================================
> +.\"
> +.SS Linux-specific advice values
> +The following Linux-specific
> +.I advice
> +values have no counterparts in the POSIX-specified
> +.BR posix_madvise (3),
> +and may or may not have counterparts in the
> +.BR madvise ()
> +interface available on other implementations.
> +.TP
> +.BR MADV_COLD " (since Linux 5.4.1)"
> +.\" commit 9c276cc65a58faf98be8e56962745ec99ab87636
> +Deactivate a given range of pages by moving them from active to inactive
> +LRU list. This is done to accelerate the reclaim of these pages. The advice
New sentences on new lines.
> +might be ignored for some pages in the range when it is not applicable.
> +.TP
> +.BR MADV_PAGEOUT " (since Linux 5.4.1)"
> +.\" commit 1a4e58cce84ee88129d5d49c064bd2852b481357
> +Reclaim a given range of pages. This is done to free up memory occupied by
> +these pages. If a page is anonymous it will be swapped out. If a page is
> +file-backed and dirty it will be written back into the backing storage.
s/into/to/
> +The advice might be ignored for some pages in the range when it is not
> +applicable.
> +.PP
> +The
> +.I flags
> +argument is reserved for future use; currently, this argument must be
> +specified as 0.
> +.PP
> +The value specified in the
> +.I vlen
> +argument must be less than or equal to
> +.BR IOV_MAX
> +(defined in
> +.I <limits.h>
> +or accessible via the call
> +.IR sysconf(_SC_IOV_MAX) ).
> +.PP
> +The
> +.I vlen
> +and
> +.I iovec
> +arguments are checked before applying any hints.
> +If the
> +.I vlen
> +is too big, or
> +.I iovec
> +is invalid, an error will be returned immediately.
> +.PP
> +Hint might be applied to a part of
s/Hint/The hint/
> +.I iovec
> +if one of its elements points to an invalid memory
> +region in the remote process. No further elements will be
> +processed beyond that point.
> +.PP
> +Permission to provide a hint to external process is governed by a
> +ptrace access mode
> +.B PTRACE_MODE_READ_REALCREDS
> +check; see
> +.BR ptrace (2)
> +and
> +.B CAP_SYS_ADMIN
> +capability that caller should have in order to affect performance
> +of an external process.
The preceding sentence is garbled. Missing words?
> +.SH RETURN VALUE
> +On success, process_madvise() returns the number of bytes advised.
> +This return value may be less than the total number of requested
> +bytes, if an error occurred. The caller should check return value
> +to determine whether a partial advice occurred.
> +.SH ERRORS
> +.TP
> +.B EFAULT
> +The memory described by
> +.I iovec
> +is outside the accessible address space of the process pid.
s/process pid./
the process referred to by
.IR pidfd .
/
> +.TP
> +.B EINVAL
> +.I flags
> +is not 0.
> +.TP
> +.B EINVAL
> +The sum of the
> +.I iov_len
> +values of
> +.I iovec
> +overflows a ssize_t value.
.I ssize_t
> +.TP
> +.B EINVAL
> +.I vlen
> +is too large.
> +.TP
> +.B ENOMEM
> +Could not allocate memory for internal copies of the
> +.I iovec
> +structures.
> +.TP
> +.B EPERM
> +The caller does not have permission to access the address space of the process
> +.I pidfd.
.IR pidfd .
> +.TP
> +.B ESRCH
> +No process with ID
> +.I pidfd
> +exists.
Should this maybe be:
[[
The target process does not exist (i.e., it has terminated and
been waited on).
]]
See pidfd_send_signal(2).
Also, is an EBADF error possible? Again, see pidfd_send_signal(2).
> +.SH VERSIONS
> +Since Linux 5.10,
Better: This system call first appeared in Linux 5.10.
> +.\" commit ecb8ac8b1f146915aa6b96449b66dd48984caacc
> +support for this system call is optional,
s/support/Support/
> +depending on the setting of the
> +.B CONFIG_ADVISE_SYSCALLS
> +configuration option.
> +.SH SEE ALSO
> +.BR madvise (2),
> +.BR pidofd_open(2),
> +.BR process_vm_readv (2),
> +.BR process_vm_write (2)
Thanks,
Michael
--
Michael Kerrisk
Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/
Linux/UNIX System Programming Training: http://man7.org/training/
More information about the Linux-security-module-archive
mailing list