[PATCH v7 1/8] fs: introduce kernel_pread_file* support

Matthew Wilcox willy at infradead.org
Mon Jun 8 13:16:30 UTC 2020


On Mon, Jun 08, 2020 at 09:03:21AM -0400, Mimi Zohar wrote:
> On Sat, 2020-06-06 at 08:52 -0700, Matthew Wilcox wrote:
> > On Fri, Jun 05, 2020 at 10:04:51PM -0700, Scott Branden wrote:
> > > -int kernel_read_file(struct file *file, void **buf, loff_t *size,
> > > -		     loff_t max_size, enum kernel_read_file_id id)
> > > -{
> > > -	loff_t i_size, pos;
> > > +int kernel_pread_file(struct file *file, void **buf, loff_t *size,
> > > +		      loff_t pos, loff_t max_size,
> > > +		      enum kernel_pread_opt opt,
> > > +		      enum kernel_read_file_id id)
> > > +{
> > > +	loff_t alloc_size;
> > > +	loff_t buf_pos;
> > > +	loff_t read_end;
> > > +	loff_t i_size;
> > >  	ssize_t bytes = 0;
> > >  	int ret;
> > >  
> > 
> > Look, it's not your fault, but this is a great example of how we end
> > up with atrocious interfaces.  Someone comes along and implements a
> > simple DWIM interface that solves their problem.  Then somebody else
> > adds a slight variant that solves their problem, and so on and so on,
> > and we end up with this bonkers API where the arguments literally change
> > meaning depending on other arguments.
> > 
> > > @@ -950,21 +955,31 @@ int kernel_read_file(struct file *file, void **buf, loff_t *size,
> > >  		ret = -EINVAL;
> > >  		goto out;
> > >  	}
> > > -	if (i_size > SIZE_MAX || (max_size > 0 && i_size > max_size)) {
> > > +
> > > +	/* Default read to end of file */
> > > +	read_end = i_size;
> > > +
> > > +	/* Allow reading partial portion of file */
> > > +	if ((opt == KERNEL_PREAD_PART) &&
> > > +	    (i_size > (pos + max_size)))
> > > +		read_end = pos + max_size;
> > > +
> > > +	alloc_size = read_end - pos;
> > > +	if (i_size > SIZE_MAX || (max_size > 0 && alloc_size > max_size)) {
> > >  		ret = -EFBIG;
> > >  		goto out;
> > 
> > ... like that.
> > 
> > I think what we actually want is:
> > 
> > ssize_t vmap_file_range(struct file *, loff_t start, loff_t end, void **bufp);
> > void vunmap_file_range(struct file *, void *buf);
> > 
> > If end > i_size, limit the allocation to i_size.  Returns the number
> > of bytes allocated, or a negative errno.  Writes the pointer allocated
> > to *bufp.  Internally, it should use the page cache to read in the pages
> > (taking appropriate reference counts).  Then it maps them using vmap()
> > instead of copying them to a private vmalloc() array.
> > 
> > kernel_read_file() can be converted to use this API.  The users will
> > need to be changed to call kernel_read_end(struct file *file, void *buf)
> > instead of vfree() so it can call allow_write_access() for them.
> > 
> > vmap_file_range() has a lot of potential uses.  I'm surprised we don't
> > have it already, to be honest.
> 
> Prior to kernel_read_file() the same or verify similar code existed in
> multiple places in the kernel.  The kernel_read_file() API
> consolidated the existing code adding the pre and post security hooks.
> 
> With this new design of not using a private vmalloc, will the file
> data be accessible prior to the post security hooks?  From an IMA
> perspective, the hooks are used for measuring and/or verifying the
> integrity of the file.

File data is already accessible prior to the post security hooks.
Look how kernel_read_file works:

        ret = deny_write_access(file);
        ret = security_kernel_read_file(file, id);
                *buf = vmalloc(i_size);
                bytes = kernel_read(file, *buf + pos, i_size - pos, &pos);
        ret = security_kernel_post_read_file(file, *buf, i_size, id);

kernel_read() will read the data into the page cache and then copy it
into the vmalloc'd buffer.  There's nothing here to prevent read accesses
to the file.



More information about the Linux-security-module-archive mailing list