[PATCH v8 4/9] coredump: add coredump socket
Paul Moore
paul at paul-moore.com
Thu May 22 23:25:18 UTC 2025
On Fri, May 16, 2025 at 7:27 AM Christian Brauner <brauner at kernel.org> wrote:
>
> Coredumping currently supports two modes:
>
> (1) Dumping directly into a file somewhere on the filesystem.
> (2) Dumping into a pipe connected to a usermode helper process
> spawned as a child of the system_unbound_wq or kthreadd.
>
> For simplicity I'm mostly ignoring (1). There's probably still some
> users of (1) out there but processing coredumps in this way can be
> considered adventurous especially in the face of set*id binaries.
>
> The most common option should be (2) by now. It works by allowing
> userspace to put a string into /proc/sys/kernel/core_pattern like:
>
> |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h
>
> The "|" at the beginning indicates to the kernel that a pipe must be
> used. The path following the pipe indicator is a path to a binary that
> will be spawned as a usermode helper process. Any additional parameters
> pass information about the task that is generating the coredump to the
> binary that processes the coredump.
>
> In the example core_pattern shown above systemd-coredump is spawned as a
> usermode helper. There's various conceptual consequences of this
> (non-exhaustive list):
>
> - systemd-coredump is spawned with file descriptor number 0 (stdin)
> connected to the read-end of the pipe. All other file descriptors are
> closed. That specifically includes 1 (stdout) and 2 (stderr). This has
> already caused bugs because userspace assumed that this cannot happen
> (Whether or not this is a sane assumption is irrelevant.).
>
> - systemd-coredump will be spawned as a child of system_unbound_wq. So
> it is not a child of any userspace process and specifically not a
> child of PID 1. It cannot be waited upon and is in a weird hybrid
> upcall which are difficult for userspace to control correctly.
>
> - systemd-coredump is spawned with full kernel privileges. This
> necessitates all kinds of weird privilege dropping excercises in
> userspace to make this safe.
>
> - A new usermode helper has to be spawned for each crashing process.
>
> This series adds a new mode:
>
> (3) Dumping into an AF_UNIX socket.
>
> Userspace can set /proc/sys/kernel/core_pattern to:
>
> @/path/to/coredump.socket
>
> The "@" at the beginning indicates to the kernel that an AF_UNIX
> coredump socket will be used to process coredumps.
>
> The coredump socket must be located in the initial mount namespace.
> When a task coredumps it opens a client socket in the initial network
> namespace and connects to the coredump socket.
>
> - The coredump server uses SO_PEERPIDFD to get a stable handle on the
> connected crashing task. The retrieved pidfd will provide a stable
> reference even if the crashing task gets SIGKILLed while generating
> the coredump.
>
> - By setting core_pipe_limit non-zero userspace can guarantee that the
> crashing task cannot be reaped behind it's back and thus process all
> necessary information in /proc/<pid>. The SO_PEERPIDFD can be used to
> detect whether /proc/<pid> still refers to the same process.
>
> The core_pipe_limit isn't used to rate-limit connections to the
> socket. This can simply be done via AF_UNIX sockets directly.
>
> - The pidfd for the crashing task will grow new information how the task
> coredumps.
>
> - The coredump server should mark itself as non-dumpable.
>
> - A container coredump server in a separate network namespace can simply
> bind to another well-know address and systemd-coredump fowards
> coredumps to the container.
>
> - Coredumps could in the future also be handled via per-user/session
> coredump servers that run only with that users privileges.
>
> The coredump server listens on the coredump socket and accepts a
> new coredump connection. It then retrieves SO_PEERPIDFD for the
> client, inspects uid/gid and hands the accepted client to the users
> own coredump handler which runs with the users privileges only
> (It must of coure pay close attention to not forward crashing suid
> binaries.).
>
> The new coredump socket will allow userspace to not have to rely on
> usermode helpers for processing coredumps and provides a safer way to
> handle them instead of relying on super privileged coredumping helpers
> that have and continue to cause significant CVEs.
>
> This will also be significantly more lightweight since no fork()+exec()
> for the usermodehelper is required for each crashing process. The
> coredump server in userspace can e.g., just keep a worker pool.
>
> Acked-by: Luca Boccassi <luca.boccassi at gmail.com>
> Reviewed-by: Kuniyuki Iwashima <kuniyu at amazon.com>
> Reviewed-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn at canonical.com>
> Signed-off-by: Christian Brauner <brauner at kernel.org>
> ---
> fs/coredump.c | 118 ++++++++++++++++++++++++++++++++++++++++++++++++++--
> include/linux/net.h | 1 +
> net/unix/af_unix.c | 54 ++++++++++++++++++------
> 3 files changed, 156 insertions(+), 17 deletions(-)
Reviewed-by: Paul Moore <paul at paul-moore.com>
--
paul-moore.com
More information about the Linux-security-module-archive
mailing list