[PATCH RFC v3 00/10] coredump: add coredump socket
Mickaël Salaün
mic at digikod.net
Mon May 5 15:38:23 UTC 2025
On Mon, May 05, 2025 at 04:56:04PM +0200, Christian Brauner wrote:
> On Mon, May 05, 2025 at 04:41:28PM +0200, Mickaël Salaün wrote:
> > On Mon, May 05, 2025 at 01:13:38PM +0200, Christian Brauner wrote:
> > > Coredumping currently supports two modes:
> > >
> > > (1) Dumping directly into a file somewhere on the filesystem.
> > > (2) Dumping into a pipe connected to a usermode helper process
> > > spawned as a child of the system_unbound_wq or kthreadd.
> > >
> > > For simplicity I'm mostly ignoring (1). There's probably still some
> > > users of (1) out there but processing coredumps in this way can be
> > > considered adventurous especially in the face of set*id binaries.
> > >
> > > The most common option should be (2) by now. It works by allowing
> > > userspace to put a string into /proc/sys/kernel/core_pattern like:
> > >
> > > |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h
> > >
> > > The "|" at the beginning indicates to the kernel that a pipe must be
> > > used. The path following the pipe indicator is a path to a binary that
> > > will be spawned as a usermode helper process. Any additional parameters
> > > pass information about the task that is generating the coredump to the
> > > binary that processes the coredump.
> > >
> > > In the example core_pattern shown above systemd-coredump is spawned as a
> > > usermode helper. There's various conceptual consequences of this
> > > (non-exhaustive list):
> > >
> > > - systemd-coredump is spawned with file descriptor number 0 (stdin)
> > > connected to the read-end of the pipe. All other file descriptors are
> > > closed. That specifically includes 1 (stdout) and 2 (stderr). This has
> > > already caused bugs because userspace assumed that this cannot happen
> > > (Whether or not this is a sane assumption is irrelevant.).
> > >
> > > - systemd-coredump will be spawned as a child of system_unbound_wq. So
> > > it is not a child of any userspace process and specifically not a
> > > child of PID 1. It cannot be waited upon and is in a weird hybrid
> > > upcall which are difficult for userspace to control correctly.
> > >
> > > - systemd-coredump is spawned with full kernel privileges. This
> > > necessitates all kinds of weird privilege dropping excercises in
> > > userspace to make this safe.
> > >
> > > - A new usermode helper has to be spawned for each crashing process.
> > >
> > > This series adds a new mode:
> > >
> > > (3) Dumping into an abstract AF_UNIX socket.
> > >
> > > Userspace can set /proc/sys/kernel/core_pattern to:
> > >
> > > @linuxafsk/coredump_socket
> > >
> > > The "@" at the beginning indicates to the kernel that the abstract
> > > AF_UNIX coredump socket will be used to process coredumps.
> > >
> > > The coredump socket uses the fixed address "linuxafsk/coredump.socket"
> > > for now.
> > >
> > > The coredump socket is located in the initial network namespace. To bind
> > > the coredump socket userspace must hold CAP_SYS_ADMIN in the initial
> > > user namespace. Listening and reading can happen from whatever
> > > unprivileged context is necessary to safely process coredumps.
> > >
> > > When a task coredumps it opens a client socket in the initial network
> > > namespace and connects to the coredump socket. For now only tasks that
> > > are acctually coredumping are allowed to connect to the initial coredump
> > > socket.
> >
> > I think we should avoid using abstract UNIX sockets, especially for new
>
> Abstract unix sockets are at the core of a modern Linux system. During
> boot alone about 100 or so are created on a modern system when I counted
> during testing. Sorry, but this is a no-show argument.
These kind of socket being used does not mean they should be used for
new interfaces. :)
AFAIK, these socket types are currently only used for IPC, not between a
kernel interface and user space. This patch series changes this
assumption.
Security policies already in place can block abstract connections, and
it might not be possible to differenciate between a kernel or user space
peer with the current configuration. Please Cc the LSM mailing list for
such new interfaces.
You cut and ignored most of my reply, which explained my reasoning and
proposed an alternative:
> > interfaces, because it is hard to properly control such access. Can we
> > create new dedicated AF_UNIX protocols instead? One could be used by a
> > privileged process in the initial namespace to create a socket to
> > collect coredumps, and the other could be dedicatde to coredumped
> > proccesses. Such (coredump collector) file descriptor or new (proxy)
> > socketpair ones could be passed to containers.
Only one new "protocol" would be required though (because the client
side is created by the kernel). That would be a backward compatible
change, and such socket type could easily be identified by other part of
the kernel, including access control mechanisms.
More information about the Linux-security-module-archive
mailing list