[GIT PULL] Block fixes for 6.18-rc3
Christian Brauner
brauner at kernel.org
Fri Oct 31 15:53:13 UTC 2025
On Fri, Oct 31, 2025 at 04:43:54PM +0100, Christian Brauner wrote:
> On Fri, Oct 24, 2025 at 01:31:11PM -0700, Linus Torvalds wrote:
> > [ Adding LSM people. Also Christian, because he did the cred refcount
>
> Sorry, late to the party. I was working on other stuf. Let me see...
>
> > cleanup with override_creds() and friends last year, and I'm
> > suggesting taking that one step further ]
> >
> > On Fri, 24 Oct 2025 at 06:58, Jens Axboe <axboe at kernel.dk> wrote:
> > >
> > > Ondrej Mosnacek (1):
> > > nbd: override creds to kernel when calling sock_{send,recv}msg()
> >
> > I've pulled this, but looking at the patch, I note that more than half
> > the patch - 75% to be exact - is just boilerplate for "I need to
> > allocate the kernel cred and deal with error handling there".
> >
> > It literally has three lines of new actual useful code (two statements
> > and one local variable declaration), and then nine lines of the "setup
> > dance".
> >
> > Which isn't wrong, but when the infrastructure boilerplate is three
> > times more than the actual code, it makes me think we should maybe
> > just get rid of the
> >
> > my_kernel_cred = prepare_kernel_cred(&init_task);
> >
> > pattern for this use-case, and just let people use "init_cred"
> > directly for things like this.
> >
> > Because that's essentially what that prepare_kernel_cred() thing
> > returns, except it allocates a new copy of said thing, so now you have
> > error handling and you have to free it after-the-fact.
> >
> > And I'm not seeing that the extra error handling and freeing dance
> > actually buys us anything at all.
> >
> > Now, some *other* users actually go on to change the creds: they want
> > that prepare_kernel_cred() dance because they then actually do
> > something else like using their own keyring or whatever (eg the NFS
> > idmap code or some other filesystem stuff).
> >
> > So it's not like prepare_kernel_cred() is wrong, but in this kind of
> > case where people just go "I'm a driver with hardware access, I want
> > to do something with kernel privileges not user privileges", it
> > actually seems counterproductive to have extra code just to complicate
> > things.
> >
> > Now, my gut feel is that if we just let people use 'init_cred'
> > directly, we should also make sure that it's always exposed as a
> > 'const struct cred' , but wouldn't that be a whole lot simpler and
> > more straightforward?
> >
> > This is *not* the only use case of that.
> >
> > We now have at least four use-cases of this "raw kernel cred" pattern:
> > core-dumping over unix domain socket, nbd, firmware loading and SCSI
> > target all do this exact thing as far as I can tell.
> >
> > So they all just want that bare kernel cred, and this interface then
> > forces it to do extra work instead of just doing
> >
> > old_cred = override_creds(&init_cred);
> > ...
> > revert_creds(old_cred);
> >
> > and it ends up being extra code for allocating and freeing that copy
> > of a cred that we already *had* and could just have used directly.
> >
> > I did just check that making 'init_cred' be const
>
> Hm, two immediate observations before I go off and write the series.
>
> (1) The thing is that init_cred would have to be exposed to modules via
> EXPORT_SYMBOL() for this to work. It would be easier to just force
> the use of init_task->cred instead.
>
> That pointer deref won't matter in the face of the allocations and
> refcounts we wipe out with this. Then we should also move init_cred
> to init/init_task.c and make it static const. Nobody really needs it
> currently.
>
> (2) I think the plain override_creds() would work but we can do better.
> I envision we can leverage CLASS() to completely hide any access to
> init_cred and force a scope with kernel creds.
>
> /me goess off to write that up.
>
> Ok, so I have it and it survives the coredump socket tests. They are a
> prime example for this sort of thing. Any unprivileged task needs to be
> able to connect to the coredump socket when it coredumps so we override
> credentials only for the path lookup. With my patchset this becomes:
>
> if (flags & SOCK_COREDUMP) {
> struct path root;
>
> task_lock(&init_task);
> get_fs_root(init_task.fs, &root);
> task_unlock(&init_task);
>
> scoped_with_kernel_creds()
> err = vfs_path_lookup(root.dentry, root.mnt, sunaddr->sun_path,
> LOOKUP_BENEATH | LOOKUP_NO_SYMLINKS |
> LOOKUP_NO_MAGICLINKS, &path);
> path_put(&root);
> if (err)
> goto fail;
> } else {
>
> Patches appended.
> Date: Fri, 31 Oct 2025 16:37:35 +0100
> From: Christian Brauner <brauner at kernel.org>
> To: Christian Brauner <brauner at kernel.org>
> Subject: [PATCH 0/6] creds: add {scoped_}with_kernel_creds()
> Message-Id: <20251031-work-creds-init_cred-v1-0-cbf0400d6e0e at kernel.org>
Needs one diff I forgot to fold:
diff --git a/init/init_task.c b/init/init_task.c
index 68059eac9a1e..15288e62334f 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -62,6 +62,9 @@ unsigned long init_shadow_call_stack[SCS_SIZE / sizeof(long)] = {
};
#endif
+/* init to 2 - one for init_task, one to ensure it is never freed */
+static struct group_info init_groups = { .usage = REFCOUNT_INIT(2) };
+
/*
* The initial credentials for the initial task
*/
diff --git a/kernel/cred.c b/kernel/cred.c
index 9ff0b349b80b..ac87ed9d43b1 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -35,9 +35,6 @@ do { \
static struct kmem_cache *cred_jar;
-/* init to 2 - one for init_task, one to ensure it is never freed */
-static struct group_info init_groups = { .usage = REFCOUNT_INIT(2) };
-
/*
* The RCU callback to actually dispose of a set of credentials
*/
More information about the Linux-security-module-archive
mailing list