[GIT PULL] Block fixes for 6.18-rc3

Fri Oct 31 15:43:54 UTC 2025

On Fri, Oct 24, 2025 at 01:31:11PM -0700, Linus Torvalds wrote:
> [ Adding LSM people. Also Christian, because he did the cred refcount

Sorry, late to the party. I was working on other stuf. Let me see...

> cleanup with override_creds() and friends last year, and I'm
> suggesting taking that one step further ]
> 
> On Fri, 24 Oct 2025 at 06:58, Jens Axboe <axboe at kernel.dk> wrote:
> >
> > Ondrej Mosnacek (1):
> >       nbd: override creds to kernel when calling sock_{send,recv}msg()
> 
> I've pulled this, but looking at the patch, I note that more than half
> the patch - 75% to be exact - is just boilerplate for "I need to
> allocate the kernel cred and deal with error handling there".
> 
> It literally has three lines of new actual useful code (two statements
> and one local variable declaration), and then nine lines of the "setup
> dance".
> 
> Which isn't wrong, but when the infrastructure boilerplate is three
> times more than the actual code, it makes me think we should maybe
> just get rid of the
> 
>     my_kernel_cred = prepare_kernel_cred(&init_task);
> 
> pattern for this use-case, and just let people use "init_cred"
> directly for things like this.
> 
> Because that's essentially what that prepare_kernel_cred() thing
> returns, except it allocates a new copy of said thing, so now you have
> error handling and you have to free it after-the-fact.
> 
> And I'm not seeing that the extra error handling and freeing dance
> actually buys us anything at all.
> 
> Now, some *other* users actually go on to change the creds: they want
> that prepare_kernel_cred() dance because they then actually do
> something else like using their own keyring or whatever (eg the NFS
> idmap code or some other filesystem stuff).
> 
> So it's not like prepare_kernel_cred() is wrong, but in this kind of
> case where people just go "I'm a driver with hardware access, I want
> to do something with kernel privileges not user privileges", it
> actually seems counterproductive to have extra code just to complicate
> things.
> 
> Now, my gut feel is that if we just let people use 'init_cred'
> directly, we should also make sure that it's always exposed as a
> 'const struct cred' , but wouldn't that be a whole lot simpler and
> more straightforward?
> 
> This is *not* the only use case of that.
> 
> We now have at least four use-cases of this "raw kernel cred" pattern:
> core-dumping over unix domain socket, nbd, firmware loading and SCSI
> target all do this exact thing as far as I can tell.
> 
> So  they all just want that bare kernel cred, and this interface then
> forces it to do extra work instead of just doing
> 
>         old_cred = override_creds(&init_cred);
>         ...
>         revert_creds(old_cred);
> 
> and it ends up being extra code for allocating and freeing that copy
> of a cred that we already *had* and could just have used directly.
> 
> I did just check that making 'init_cred' be const

Hm, two immediate observations before I go off and write the series.

(1) The thing is that init_cred would have to be exposed to modules via
    EXPORT_SYMBOL() for this to work. It would be easier to just force
    the use of init_task->cred instead.

    That pointer deref won't matter in the face of the allocations and
    refcounts we wipe out with this. Then we should also move init_cred
    to init/init_task.c and make it static const. Nobody really needs it
    currently.

(2) I think the plain override_creds() would work but we can do better.
    I envision we can leverage CLASS() to completely hide any access to
    init_cred and force a scope with kernel creds.

/me goess off to write that up.

Ok, so I have it and it survives the coredump socket tests. They are a
prime example for this sort of thing. Any unprivileged task needs to be
able to connect to the coredump socket when it coredumps so we override
credentials only for the path lookup. With my patchset this becomes:

        if (flags & SOCK_COREDUMP) {
                struct path root;

                task_lock(&init_task);
                get_fs_root(init_task.fs, &root);
                task_unlock(&init_task);

                scoped_with_kernel_creds() 
			err = vfs_path_lookup(root.dentry, root.mnt, sunaddr->sun_path,
					      LOOKUP_BENEATH | LOOKUP_NO_SYMLINKS |
					      LOOKUP_NO_MAGICLINKS, &path);
                path_put(&root);
                if (err)
                        goto fail;
        } else {

Patches appended.