[PATCH v6] vfs, security: Fix automount superblock LSM init problem, preventing NFS sb sharing

Jeff Layton jlayton at kernel.org
Thu Aug 3 16:09:33 UTC 2023


On Thu, 2023-08-03 at 15:27 +0200, Christian Brauner wrote:
> On Wed, Aug 02, 2023 at 03:34:27PM -0400, Jeff Layton wrote:
> > On Wed, 2023-08-02 at 14:16 -0400, Paul Moore wrote:
> > > On Aug  2, 2023 Jeff Layton <jlayton at kernel.org> wrote:
> > > > 
> > > > When NFS superblocks are created by automounting, their LSM parameters
> > > > aren't set in the fs_context struct prior to sget_fc() being called,
> > > > leading to failure to match existing superblocks.
> > > > 
> > > > Fix this by adding a new LSM hook to load fc->security for submount
> > > > creation when alloc_fs_context() is creating the fs_context for it.
> > > > 
> > > > However, this uncovers a further bug: nfs_get_root() initialises the
> > > > superblock security manually by calling security_sb_set_mnt_opts() or
> > > > security_sb_clone_mnt_opts() - but then vfs_get_tree() calls
> > > > security_sb_set_mnt_opts(), which can lead to SELinux, at least,
> > > > complaining.
> > > > 
> > > > Fix that by adding a flag to the fs_context that suppresses the
> > > > security_sb_set_mnt_opts() call in vfs_get_tree().  This can be set by NFS
> > > > when it sets the LSM context on the new superblock.
> > > > 
> > > > The first bug leads to messages like the following appearing in dmesg:
> > > > 
> > > > 	NFS: Cache volume key already in use (nfs,4.2,2,108,106a8c0,1,,,,100000,100000,2ee,3a98,1d4c,3a98,1)
> > > > 
> > > > Signed-off-by: David Howells <dhowells at redhat.com>
> > > > Signed-off-by: Jeff Layton <jlayton at kernel.org>
> > > > Fixes: 9bc61ab18b1d ("vfs: Introduce fs_context, switch vfs_kern_mount() to it.")
> > > > Fixes: 779df6a5480f ("NFS: Ensure security label is set for root inode)
> > > > Tested-by: Jeff Layton <jlayton at kernel.org>
> > > > Reviewed-by: Jeff Layton <jlayton at kernel.org>
> > > > Acked-by: Casey Schaufler <casey at schaufler-ca.com>
> > > > Acked-by: "Christian Brauner (Microsoft)" <brauner at kernel.org>
> > > > Link: https://lore.kernel.org/r/165962680944.3334508.6610023900349142034.stgit@warthog.procyon.org.uk/ # v1
> > > > Link: https://lore.kernel.org/r/165962729225.3357250.14350728846471527137.stgit@warthog.procyon.org.uk/ # v2
> > > > Link: https://lore.kernel.org/r/165970659095.2812394.6868894171102318796.stgit@warthog.procyon.org.uk/ # v3
> > > > Link: https://lore.kernel.org/r/166133579016.3678898.6283195019480567275.stgit@warthog.procyon.org.uk/ # v4
> > > > Link: https://lore.kernel.org/r/217595.1662033775@warthog.procyon.org.uk/ # v5
> > > > ---
> > > > This patch was originally sent by David several months ago, but it
> > > > never got merged. I'm resending to resurrect the discussion. Can we
> > > > get this fixed?
> > > 
> > > Sorry, I sorta lost track of this after the ROOTCONTEXT_MNT discussion
> > > back in v3.  Looking at it a bit closer now I have one nitpicky
> > > request and one larger concern (see below).
> > > 
> > > > diff --git a/fs/super.c b/fs/super.c
> > > > index e781226e2880..13adf43e2e5d 100644
> > > > --- a/fs/super.c
> > > > +++ b/fs/super.c
> > > > @@ -1541,10 +1541,12 @@ int vfs_get_tree(struct fs_context *fc)
> > > >  	smp_wmb();
> > > >  	sb->s_flags |= SB_BORN;
> > > >  
> > > > -	error = security_sb_set_mnt_opts(sb, fc->security, 0, NULL);
> > > > -	if (unlikely(error)) {
> > > > -		fc_drop_locked(fc);
> > > > -		return error;
> > > > +	if (!(fc->lsm_set)) {
> > > > +		error = security_sb_set_mnt_opts(sb, fc->security, 0, NULL);
> > > > +		if (unlikely(error)) {
> > > > +			fc_drop_locked(fc);
> > > > +			return error;
> > > > +		}
> > > >  	}
> > > 
> > > I generally dislike core kernel code which makes LSM calls conditional
> > > on some kernel state maintained outside the LSM.  Sometimes it has to
> > > be done as there is no other good options, but I would like us to try
> > > and avoid it if possible.  The commit description mentioned that this
> > > was put here to avoid a SELinux complaint, can you provide an example
> > > of the complain?  Does it complain about a double/invalid mount, e.g.
> > > "SELinux: mount invalid.  Same superblock, different security ..."?
> > > 
> > 
> > The problem I had was not so much SELinux warnings, but rather that in a
> > situation where I would expect to share superblocks between two
> > filesystems, it didn't.
> > 
> > Basically if you do something like this:
> > 
> > # mount nfsserver:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0
> > # mount nfsserver:/export/bar /mnt/bar -o context=system_u:object_r:root_t:s0
> > 
> > ...when "foo" and "bar" are directories on the same filesystem on the
> > server, you should get two vfsmounts that share a superblock. That's
> > what you get if selinux is disabled, but not when it's enabled (even
> > when it's in permissive mode).
> > 
> > The problems that David hit with the automounter have a similar root
> > cause though, I believe.
> > 
> > > I'd like to understand why the sb_set_mnt_opts() call fails when it
> > > comes after the fs_context_init() call.  I'm particulary curious to
> > > know if the failure is due to conflicting SELinux state in the
> > > fs_context, or if it is simply an issue of sb_set_mnt_opts() not
> > > properly handling existing values.  Perhaps I'm being overly naive,
> > > but I'm hopeful that we can address both of these within the SELinux
> > > code itself.
> > > 
> > 
> > The problem I hit was that nfs_compare_super is called with a fs_context
> > that has a NULL ->security pointer. That caused it to call
> > selinux_sb_mnt_opts_compat with mnt_opts set to NULL, and at that point
> > it returns 1 and decides not to share sb's.
> 
> I tried to follow this because I'm really still quite puzzled by this
> whole thing. Two consecutive mounts that should share the superblock
> don't share the superblock. But behavior differs between nfs3 and nfs4
> due to how automounting works.
> 
> Afaict, the callchain you're looking at in this scenario is:
> 
> (1) nfs3
> 
> (1.1) mount 127.0.0.1:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0,nfsvers=3
>       vfs_get_tree(fc_foo)
>       -> fs_contex_operations->get_tree::nfs_get_tree(fc_foo)
>             -> ctx->nfs_mod->rpc_ops->try_get_tree::nfs_try_get_tree(fc_foo)
>                -> nfs_get_tree_common(fc_foo)
>                   -> sb_foo = sget_fc(fc_foo, nfs_compare_super, ...)
> 
> (1.2) mount 127.0.0.1:/export/bar /mnt/bar -o context=system_u:object_r:root_t:s0,nfsvers=3
>       vfs_get_tree(fc_bar)
>       -> fs_contex_operations->get_tree::nfs_get_tree(fc_bar)
>             -> ctx->nfs_mod->rpc_ops->try_get_tree::nfs_try_get_tree(fc_bar)
>                -> nfs_get_tree_common(fc_bar)
>                   -> sb_foo = sget_fc(fc_bar, nfs_compare_super, ...)
>                      -> nfs_compare_super(sb_foo, fc_bar)
>                         -> selinux_sb_mnt_opts_compat(sb_foo, fc_bar->security)
> 
> And fc_bar->security is non-NULL and compatible with sb_foo's current
> security settings. Fine.
> 
> (2) nfs4
> 
> But for nfs4 we're looking at a vastly more complicated callchain at
> least looking at this from a local nfs:
> 
> (2.1) mount 127.0.0.1:/export/foo /mnt/foo -o context=system_u:object_r:root_t:s0
>       vfs_get_tree(fc_foo)
>       -> fs_contex_operations->get_tree::nfs_get_tree(fc_foo)
>          -> if (!ctx->internal) branch is taken
>             -> ctx->nfs_mod->rpc_ops->try_get_tree::nfs4_try_get_tree(fc_foo)
>                -> do_nfs4_mount(fc_foo)
>                   -> fc_dup_foo = vfs_dup_fs_context(fc_foo)
>                     -> security_fs_context_dup(fc_dup_foo, fc_foo)
>                        {
>                                 fc_dup_foo->security = kmemdup(fc_foo->security)
>                        }
>                        new_fs_context->internal = true
>                   -> foo_mnt = fc_mount(fc_dup_foo)
>                     -> vfs_get_tree(fc_dup_foo)
>                        -> if (!ctx->internal) branch is _not_ taken
>                           -> nfs_get_tree_common(fc_dup_foo)
>                                  sb_foo = sget_fc(fc, nfs_compare_super, ...)
>                   -> mount_subtree()
>                      -> vfs_path_lookup(..., "/export/foo", LOOKUP_AUTOMOUNT)
>                         -> nfs_d_automount("export")
>                            -> fc_sub_foo = fs_context_for_submount()
>                               {
>                                       fc_sub_bar->security = NULL


Should the above be:

					fc_sub_foo->security = NULL;

?

If so, then with this patch, the above would no longer be NULL. We'd
inherit the security context info from the reference dentry passed to
fs_context_for_submount().

>                               {
>                            -> nfs4_submount(fc_sub_foo)
>                               -> nfs4_do_submount(fc_sub_foo)
>                                  -> vfs_get_tree(fc_sub_foo)
>                                     -> nfs_get_tree_common(fc_sub_foo)
>                                        -> sb_foo_2 = sget_fc(fc_sub_foo, nfs_compare_super, ...)
>                         -> nfs_d_automount("foo")
>                            -> fc_sub_foo = fs_context_for_submount()
>                               {
>                                       fc_sub_bar->security = NULL

Ditto here -- that should be fc_sub_foo , correct?
>                               {
>                            -> nfs4_submount(fc_sub_foo)
>                               -> nfs4_do_submount(fc_sub_foo)
>                                  -> vfs_get_tree(fc_sub_foo)
>                                     -> nfs_get_tree_common(fc_sub_foo)
>              |--------------------------> sb_foo_3 = sget_fc(fc_sub_foo, nfs_compare_super, ...)
>              |
> As far as I can see you're already allocating 3 separate superblocks of
> which two are discarded and only one survives. Afaict, the one that
> survives is _| the last one. Under the assumption that I'm correct,
> where does the third superblock get it's selinux context from given that
> fc->security isn't even set during submount?
> 

That's the problem this patch is intended to fix. It allows child mounts
to properly inherit security options from a parent dentry.

> And where is the context=%s output generated for mountinfo?
> 

security_sb_show_options / selinux_sb_show_options

> Is this a correct callchain?
> 

I think it looks about right, but I didn't verify the details to the
degree you have.

> > 
> > Filling out fc->security with this new operation seems to fix that, but
> > if you see a better way to do this, then I'm certainly open to the idea.
> > 
> > > In a worst case situation, we could always implement a flag *inside*
> > > the SELinux code, similar to what has been done with 'lsm_set' here.
> > > 
> > 
> > I'm fine with a different solution, if you see a better one. You'll have
> 
> Independent of the modification in fs_context_for_submount() you might want to
> think about something like:
> 
> static const struct fs_context_operations nfs4_fs_context_ops = {
>       .free           = nfs4_free,
>       .parse_param    = nfs4_parse_param,
>       .get_tree       = nfs4_get_tree,
> };
> 
> static const struct fs_context_operations nfs4_fs_submount_ops = {
>       .free           = nfs4_free_submount,
>       .parse_param    = nfs4_parse_param_submount,
>       .get_tree       = nfs4_get_tree_submount,
> };
> 
> static int nfs4_init_fs_context_submount(struct fs_context *fc)
> {
>         return 0;
> }
> 
> static int nfs4_fs_context_get_tree(struct fs_context *fc)
> {
>         if (fc->purpose == FS_CONTEXT_FOR_SUBMOUNT)
>                 fc->ops = &nfs4_fs_submount_ops;
>         else
>                 fc->ops = &nfs4_fs_context_ops;
>         .
>         .
>         .
> }
> 
> which will make the callchain probably a lot to follow instead of wafting
> through the same nested functions over and over. But just a thought.

Sounds reasonable. I'd rather do that sort of cleanup afterward though,
to make this patch easier to eventually backport.
-- 
Jeff Layton <jlayton at kernel.org>



More information about the Linux-security-module-archive mailing list