[PATCH v13 01/10] landlock: Add IOCTL access right for character and block devices

Günther Noack gnoack at google.com
Fri Apr 5 16:17:17 UTC 2024


On Wed, Apr 03, 2024 at 01:15:45PM +0200, Mickaël Salaün wrote:
> On Tue, Apr 02, 2024 at 08:28:49PM +0200, Günther Noack wrote:
> > On Wed, Mar 27, 2024 at 05:57:31PM +0100, Mickaël Salaün wrote:
> > > On Wed, Mar 27, 2024 at 01:10:31PM +0000, Günther Noack wrote:
> > > > +	case FIOQSIZE:
> > > > +		/*
> > > > +		 * FIOQSIZE queries the size of a regular file or directory.
> > > > +		 *
> > > > +		 * This IOCTL command only applies to regular files and
> > > > +		 * directories.
> > > > +		 */
> > > > +		return LANDLOCK_ACCESS_FS_IOCTL_DEV;
> > > 
> > > This should always be allowed because do_vfs_ioctl() never returns
> > > -ENOIOCTLCMD for this command.  That's why I wrote
> > > vfs_masked_device_ioctl() this way [1].  I think it would be easier to
> > > read and maintain this code with a is_masked_device_ioctl() logic.  Listing
> > > commands that are not masked makes it difficult to review because
> > > allowed and denied return codes are interleaved.
> > 
> > Oh, I misunderstood you on [2], I think -- I was under the impression that you
> > wanted to keep the switch case in the same order (and with the same entries?) as
> > the original in do_vfs_ioctl.  So you'd prefer to only list the always-allowed
> > IOCTL commands here, as you have done in vfs_masked_device_ioctl() [3]?
> > 
> > [2] https://lore.kernel.org/all/20240326.ooCheem1biV2@digikod.net/
> > [3] https://lore.kernel.org/all/20240219183539.2926165-1-mic@digikod.net/
> 
> That was indeed unclear.  About IOCTL commands, the same order ease
> reviewing and maintenance but we don't need to list all commands,
> which will limit updates of this list.  However, for the current
> unused/unmasked one, we can still add them very briefly in comments as I
> did with FIONREAD and file_ioctl()'s ones in vfs_masked_device_ioctl().
> Only listing the "masked" ones (for device case) shorten the list, and
> having a list with the same semantic ("mask device-specific IOCTLs")
> ease review and maintenance as well.
> 
> > 
> > Can you please clarify how you make up your mind about what should be permitted
> > and what should not?  I have trouble understanding the rationale for the changes
> > that you asked for below, apart from the points that they are harmless and that
> > the return codes should be consistent.
> 
> The rationale is the same: all IOCTL commands that are not
> passed/specific to character or block devices (i.e. IOCTLs defined in
> fs/ioctl.c) are allowed.  vfs_masked_device_ioctl() returns true if the
> IOCTL command is not passed to the related device driver but handled by
> fs/ioctl.c instead (i.e. handled by the VFS layer).

Thanks for clarifying -- this makes more sense now.  I traced the cases with
-ENOIOCTLCMD through the code more thoroughly and it is more aligned now with
what you implemented before.  The places where I ended up implementing it
differently to your vfs_masked_device_ioctl() patch are:

 * Do not blanket-permit FS_IOC_{GET,SET}{FLAGS,XATTR}.
   They fall back to the device implementation.

 * FS_IOC_GETUUID and FS_IOC_GETFSSYSFSPATH are now handled.
   These return -ENOIOCTLCMD from do_vfs_ioctl(), so they do fall back to the
   handlers in struct file_operations, so we can not permit these either.

These seem like pretty clear cases to me.


> > The criteria that I have used in this patch set are that (a) it is implemented
> > in do_vfs_ioctl() rather than further below, and (b) it makes sense to use that
> > command on a device file.  (If we permit FIOQSIZE, FS_IOC_FIEMAP and others
> > here, we will get slightly more correct error codes in these cases, but the
> > IOCTLs will still not work, because they are not useful and not implemented for
> > devices. -- On the other hand, we are also increasing the exposed code surface a
> > bit.  For example, FS_IOC_FIEMAP is calling into inode->i_op->fiemap().  That is
> > probably harmless for device files, but requires us to reason at a deeper level
> > to convince ourselves of that.)
> 
> FIOQSIZE is fully handled by do_vfs_ioctl(), and FS_IOC_FIEMAP is
> implemented as the inode level, so it should not be passed at the struct
> file/device level unless ENOIOCTLCMD is returned (but it should not,
> right?).  Because it depends on the inode implementation, it looks like
> this IOCTL may work (in theory) on character or block devices too.  If
> this is correct, we should not deny it because the semantic of
> LANDLOCK_ACCESS_FS_IOCTL_DEV is to control IOCTLs passed to device
> drivers.  Furthermore, as you pointed out, error codes would be
> unaltered.
> 
> It would be good to test (as you suggested IIRC) the masked commands on
> a simple device (e.g. /dev/null) to check that it returns ENOTTY,
> EOPNOTSUPP, or EACCES according to our expectations.

Sounds good, I'll add a test.


> I agree that this would increase a bit the exposed code surface but I'm
> pretty sure that if a sandboxed process is allowed to access a device
> file, it is also allowed to access directory or other file types as well
> and then would still be able to reach the FS_IOC_FIEMAP implementation.

I assume you mean FIGETBSZ?  The FS_IOC_FIEMAP IOCTL is the one that returns
file extent maps, so that user space can reason about whether a file is stored
in a consecutive way on disk.


> I'd like to avoid exceptions as in the current implementation of
> get_required_ioctl_dev_access() with a switch/case either returning 0 or
> LANDLOCK_ACCESS_FS_IOCTL_DEV (excluding the default case of course).  An
> alternative approach would be to group IOCTL command cases according to
> their returned value, but I find it a bit more complex for no meaningful
> gain.  What do you think?

I don't have strong opinions about it, as long as we don't accidentally mess up
the fallbacks if this changes.


> > In your implementation at [3], you were permitting FICLONE* and FIDEDUPERANGE,
> > but not FS_IOC_ZERO_RANGE, which is like fallocate().  How are these cases
> > different to each other?  Is that on purpose?
> 
> FICLONE* and FIDEDUPERANGE match device files and the
> vfs_clone_file_range()/generic_file_rw_checks() check returns EINVAL for
> device files.  So there is no need to add exceptions for these commands.
> 
> FS_IOC_ZERO_RANGE is only implemented for regular files (see
> file_ioctl() call), so it is passed to device files.

Makes sense :)


—Günther



More information about the Linux-security-module-archive mailing list