[PATCH] overlayfs: Trigger file re-evaluation by IMA / EVM after writes

Christian Brauner brauner at kernel.org
Fri Apr 7 08:31:30 UTC 2023


On Fri, Apr 07, 2023 at 09:42:43AM +0300, Amir Goldstein wrote:
> On Thu, Apr 6, 2023 at 11:23 PM Stefan Berger <stefanb at linux.ibm.com> wrote:
> >
> >
> >
> > On 4/6/23 15:37, Jeff Layton wrote:
> > > On Thu, 2023-04-06 at 15:11 -0400, Stefan Berger wrote:
> > >>
> > >> On 4/6/23 14:46, Jeff Layton wrote:
> > >>> On Thu, 2023-04-06 at 17:01 +0200, Christian Brauner wrote:
> > >>>> On Thu, Apr 06, 2023 at 10:36:41AM -0400, Paul Moore wrote:
> > >>
> > >>>
> > >>> Correct. As long as IMA is also measuring the upper inode then it seems
> > >>> like you shouldn't need to do anything special here.
> > >>
> > >> Unfortunately IMA does not notice the changes. With the patch provided in the other email IMA works as expected.
> > >>
> > >
> > >
> > > It looks like remeasurement is usually done in ima_check_last_writer.
> > > That gets called from __fput which is called when we're releasing the
> > > last reference to the struct file.
> > >
> > > You've hooked into the ->release op, which gets called whenever
> > > filp_close is called, which happens when we're disassociating the file
> > > from the file descriptor table.
> > >
> > > So...I don't get it. Is ima_file_free not getting called on your file
> > > for some reason when you go to close it? It seems like that should be
> > > handling this.
> >
> > I would ditch the original proposal in favor of this 2-line patch shown here:
> >
> > https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232
> >
> >
> > The new proposed i_version increase occurs on the inode that IMA sees later on for
> > the file that's being executed and on which it must do a re-evaluation.
> >
> > Upon file changes ima_inode_free() seems to see two ima_file_free() calls,
> > one for what seems to be the upper layer (used for vfs_* functions below)
> > and once for the lower one.
> > The important thing is that IMA will see the lower one when the file gets
> > executed later on and this is the one that I instrumented now to have its
> > i_version increased, which in turn triggers the re-evaluation of the file post
> > modification.
> >
> > static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
> > [...]
> >         struct fd real;
> > [...]
> >         ret = ovl_real_fdget(file, &real);
> >         if (ret)
> >                 goto out_unlock;
> >
> > [...]
> >         if (is_sync_kiocb(iocb)) {
> >                 file_start_write(real.file);
> > -->             ret = vfs_iter_write(real.file, iter, &iocb->ki_pos,
> >                                      ovl_iocb_to_rwf(ifl));
> >                 file_end_write(real.file);
> >                 /* Update size */
> >                 ovl_copyattr(inode);
> >         } else {
> >                 struct ovl_aio_req *aio_req;
> >
> >                 ret = -ENOMEM;
> >                 aio_req = kmem_cache_zalloc(ovl_aio_request_cachep, GFP_KERNEL);
> >                 if (!aio_req)
> >                         goto out;
> >
> >                 file_start_write(real.file);
> >                 /* Pacify lockdep, same trick as done in aio_write() */
> >                 __sb_writers_release(file_inode(real.file)->i_sb,
> >                                      SB_FREEZE_WRITE);
> >                 aio_req->fd = real;
> >                 real.flags = 0;
> >                 aio_req->orig_iocb = iocb;
> >                 kiocb_clone(&aio_req->iocb, iocb, real.file);
> >                 aio_req->iocb.ki_flags = ifl;
> >                 aio_req->iocb.ki_complete = ovl_aio_rw_complete;
> >                 refcount_set(&aio_req->ref, 2);
> > -->             ret = vfs_iocb_iter_write(real.file, &aio_req->iocb, iter);
> >                 ovl_aio_put(aio_req);
> >                 if (ret != -EIOCBQUEUED)
> >                         ovl_aio_cleanup_handler(aio_req);
> >         }
> >          if (ret > 0)                                           <--- this get it to work
> >                  inode_maybe_inc_iversion(inode, false);                <--- since this inode is known to IMA
> 
> If the aio is queued, then I think increasing i_version here may be premature.
> 
> Note that in this code flow, the ovl ctime is updated in
> ovl_aio_cleanup_handler() => ovl_copyattr()
> after file_end_write(), similar to the is_sync_kiocb() code patch.
> 
> It probably makes most sense to include i_version in ovl_copyattr().
> Note that this could cause ovl i_version to go backwards on copy up
> (i.e. after first open for write) when switching from the lower inode
> i_version to the upper inode i_version.
> 
> Jeff's proposal to use vfs_getattr_nosec() in IMA code is fine too.
> It will result in the same i_version jump.
> 
> IMO it wouldn't hurt to have a valid i_version value in the ovl inode
> as well. If the ovl inode values would not matter, we would not have
> needed  ovl_copyattr() at all, but it's not good to keep vfs in the dark...
> 
> Thanks,
> Amir.

On Thu, Apr 06, 2023 at 05:24:11PM -0400, Jeff Layton wrote:
> On Thu, 2023-04-06 at 16:22 -0400, Stefan Berger wrote:
> > 
> > On 4/6/23 15:37, Jeff Layton wrote:
> > > On Thu, 2023-04-06 at 15:11 -0400, Stefan Berger wrote:
> > > > 
> > > > On 4/6/23 14:46, Jeff Layton wrote:
> > > > > On Thu, 2023-04-06 at 17:01 +0200, Christian Brauner wrote:
> > > > > > On Thu, Apr 06, 2023 at 10:36:41AM -0400, Paul Moore wrote:
> > > > 
> > > > > 
> > > > > Correct. As long as IMA is also measuring the upper inode then it seems
> > > > > like you shouldn't need to do anything special here.
> > > > 
> > > > Unfortunately IMA does not notice the changes. With the patch provided in the other email IMA works as expected.
> > > > 
> > > 
> > > 
> > > It looks like remeasurement is usually done in ima_check_last_writer.
> > > That gets called from __fput which is called when we're releasing the
> > > last reference to the struct file.
> > > 
> > > You've hooked into the ->release op, which gets called whenever
> > > filp_close is called, which happens when we're disassociating the file
> > > from the file descriptor table.
> > > 
> > > So...I don't get it. Is ima_file_free not getting called on your file
> > > for some reason when you go to close it? It seems like that should be
> > > handling this.
> > 
> > I would ditch the original proposal in favor of this 2-line patch shown here:
> > 
> > https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232
> > 
> > 
> 
> Ok, I think I get it. IMA is trying to use the i_version from the
> overlayfs inode.

Which is likely to give wrong results and I agree with you that it
should rely on vfs_getattr_nosec().

> 
> I suspect that the real problem here is that IMA is just doing a bare
> inode_query_iversion. Really, we ought to make IMA call
> vfs_getattr_nosec (or something like it) to query the getattr routine in
> the upper layer. Then overlayfs could just propagate the results from
> the upper layer in its response.
> 
> That sort of design may also eventually help IMA work properly with more
> exotic filesystems, like NFS or Ceph.
> 
> > The new proposed i_version increase occurs on the inode that IMA sees later on for
> > the file that's being executed and on which it must do a re-evaluation.
> > 
> > Upon file changes ima_inode_free() seems to see two ima_file_free() calls,
> > one for what seems to be the upper layer (used for vfs_* functions below)
> > and once for the lower one.
> > The important thing is that IMA will see the lower one when the file gets
> > executed later on and this is the one that I instrumented now to have its
> > i_version increased, which in turn triggers the re-evaluation of the file post
> > modification.
> > 
> > static ssize_t ovl_write_iter(struct kiocb *iocb, struct iov_iter *iter)
> > [...]
> > 	struct fd real;
> > [...]
> > 	ret = ovl_real_fdget(file, &real);
> > 	if (ret)
> > 		goto out_unlock;
> > 
> > [...]
> > 	if (is_sync_kiocb(iocb)) {
> > 		file_start_write(real.file);
> > -->		ret = vfs_iter_write(real.file, iter, &iocb->ki_pos,
> > 				     ovl_iocb_to_rwf(ifl));
> > 		file_end_write(real.file);
> > 		/* Update size */
> > 		ovl_copyattr(inode);
> > 	} else {
> > 		struct ovl_aio_req *aio_req;
> > 
> > 		ret = -ENOMEM;
> > 		aio_req = kmem_cache_zalloc(ovl_aio_request_cachep, GFP_KERNEL);
> > 		if (!aio_req)
> > 			goto out;
> > 
> > 		file_start_write(real.file);
> > 		/* Pacify lockdep, same trick as done in aio_write() */
> > 		__sb_writers_release(file_inode(real.file)->i_sb,
> > 				     SB_FREEZE_WRITE);
> > 		aio_req->fd = real;
> > 		real.flags = 0;
> > 		aio_req->orig_iocb = iocb;
> > 		kiocb_clone(&aio_req->iocb, iocb, real.file);
> > 		aio_req->iocb.ki_flags = ifl;
> > 		aio_req->iocb.ki_complete = ovl_aio_rw_complete;
> > 		refcount_set(&aio_req->ref, 2);
> > -->		ret = vfs_iocb_iter_write(real.file, &aio_req->iocb, iter);
> > 		ovl_aio_put(aio_req);
> > 		if (ret != -EIOCBQUEUED)
> > 			ovl_aio_cleanup_handler(aio_req);
> > 	}
> >          if (ret > 0)						<--- this get it to work
> >                  inode_maybe_inc_iversion(inode, false);		<--- since this inode is known to IMA
> > out:
> >          revert_creds(old_cred);
> > out_fdput:
> >          fdput(real);
> > 
> > out_unlock:
> >          inode_unlock(inode);
> > 
> > 
> > 
> > 
> >     Stefan
> > 
> > > 
> > > In any case, I think this could use a bit more root-cause analysis.
> > 
> 
> -- 
> Jeff Layton <jlayton at kernel.org>

On Thu, Apr 06, 2023 at 06:04:36PM -0400, Jeff Layton wrote:
> On Thu, 2023-04-06 at 17:24 -0400, Jeff Layton wrote:
> > On Thu, 2023-04-06 at 16:22 -0400, Stefan Berger wrote:
> > > 
> > > On 4/6/23 15:37, Jeff Layton wrote:
> > > > On Thu, 2023-04-06 at 15:11 -0400, Stefan Berger wrote:
> > > > > 
> > > > > On 4/6/23 14:46, Jeff Layton wrote:
> > > > > > On Thu, 2023-04-06 at 17:01 +0200, Christian Brauner wrote:
> > > > > > > On Thu, Apr 06, 2023 at 10:36:41AM -0400, Paul Moore wrote:
> > > > > 
> > > > > > 
> > > > > > Correct. As long as IMA is also measuring the upper inode then it seems
> > > > > > like you shouldn't need to do anything special here.
> > > > > 
> > > > > Unfortunately IMA does not notice the changes. With the patch provided in the other email IMA works as expected.
> > > > > 
> > > > 
> > > > 
> > > > It looks like remeasurement is usually done in ima_check_last_writer.
> > > > That gets called from __fput which is called when we're releasing the
> > > > last reference to the struct file.
> > > > 
> > > > You've hooked into the ->release op, which gets called whenever
> > > > filp_close is called, which happens when we're disassociating the file
> > > > from the file descriptor table.
> > > > 
> > > > So...I don't get it. Is ima_file_free not getting called on your file
> > > > for some reason when you go to close it? It seems like that should be
> > > > handling this.
> > > 
> > > I would ditch the original proposal in favor of this 2-line patch shown here:
> > > 
> > > https://lore.kernel.org/linux-integrity/a95f62ed-8b8a-38e5-e468-ecbde3b221af@linux.ibm.com/T/#m3bd047c6e5c8200df1d273c0ad551c645dd43232

We should cool it with the quick hacks to fix things. :)

> > > 
> > > 
> > 
> > Ok, I think I get it. IMA is trying to use the i_version from the
> > overlayfs inode.
> > 
> > I suspect that the real problem here is that IMA is just doing a bare
> > inode_query_iversion. Really, we ought to make IMA call
> > vfs_getattr_nosec (or something like it) to query the getattr routine in
> > the upper layer. Then overlayfs could just propagate the results from
> > the upper layer in its response.
> > 
> > That sort of design may also eventually help IMA work properly with more
> > exotic filesystems, like NFS or Ceph.
> > 
> > 
> > 
> 
> Maybe something like this? It builds for me but I haven't tested it. It
> looks like overlayfs already should report the upper layer's i_version
> in getattr, though I haven't tested that either:
> 
> -----------------------8<---------------------------
> 
> [PATCH] IMA: use vfs_getattr_nosec to get the i_version
> 
> IMA currently accesses the i_version out of the inode directly when it
> does a measurement. This is fine for most simple filesystems, but can be
> problematic with more complex setups (e.g. overlayfs).
> 
> Make IMA instead call vfs_getattr_nosec to get this info. This allows
> the filesystem to determine whether and how to report the i_version, and
> should allow IMA to work properly with a broader class of filesystems in
> the future.
> 
> Reported-by: Stefan Berger <stefanb at linux.ibm.com>
> Signed-off-by: Jeff Layton <jlayton at kernel.org>
> ---

So, I think we want both; we want the ovl_copyattr() and the
vfs_getattr_nosec() change:

(1) overlayfs should copy up the inode version in ovl_copyattr(). That
    is in line what we do with all other inode attributes. IOW, the
    overlayfs inode's i_version counter should aim to mirror the
    relevant layer's i_version counter. I wouldn't know why that
    shouldn't be the case. Asking the other way around there doesn't
    seem to be any use for overlayfs inodes to have an i_version that
    isn't just mirroring the relevant layer's i_version.
(2) Jeff's changes for ima to make it rely on vfs_getattr_nosec().
    Currently, ima assumes that it will get the correct i_version from
    an inode but that just doesn't hold for stacking filesystem.

While (1) would likely just fix the immediate bug (2) is correct and
_robust_. If we change how attributes are handled vfs_*() helpers will
get updated and ima with it. Poking at raw inodes without using
appropriate helpers is much more likely to get ima into trouble.

Christian



More information about the Linux-security-module-archive mailing list