[PATCH v7 1/8] fs: introduce kernel_pread_file* support

Mon Jun 8 13:16:30 UTC 2020

On Mon, Jun 08, 2020 at 09:03:21AM -0400, Mimi Zohar wrote:
> On Sat, 2020-06-06 at 08:52 -0700, Matthew Wilcox wrote:
> > On Fri, Jun 05, 2020 at 10:04:51PM -0700, Scott Branden wrote:
> > > -int kernel_read_file(struct file *file, void **buf, loff_t *size,
> > > -		     loff_t max_size, enum kernel_read_file_id id)
> > > -{
> > > -	loff_t i_size, pos;
> > > +int kernel_pread_file(struct file *file, void **buf, loff_t *size,
> > > +		      loff_t pos, loff_t max_size,
> > > +		      enum kernel_pread_opt opt,
> > > +		      enum kernel_read_file_id id)
> > > +{
> > > +	loff_t alloc_size;
> > > +	loff_t buf_pos;
> > > +	loff_t read_end;
> > > +	loff_t i_size;
> > >  	ssize_t bytes = 0;
> > >  	int ret;
> > >  
> > 
> > Look, it's not your fault, but this is a great example of how we end
> > up with atrocious interfaces.  Someone comes along and implements a
> > simple DWIM interface that solves their problem.  Then somebody else
> > adds a slight variant that solves their problem, and so on and so on,
> > and we end up with this bonkers API where the arguments literally change
> > meaning depending on other arguments.
> > 
> > > @@ -950,21 +955,31 @@ int kernel_read_file(struct file *file, void **buf, loff_t *size,
> > >  		ret = -EINVAL;
> > >  		goto out;
> > >  	}
> > > -	if (i_size > SIZE_MAX || (max_size > 0 && i_size > max_size)) {
> > > +
> > > +	/* Default read to end of file */
> > > +	read_end = i_size;
> > > +
> > > +	/* Allow reading partial portion of file */
> > > +	if ((opt == KERNEL_PREAD_PART) &&
> > > +	    (i_size > (pos + max_size)))
> > > +		read_end = pos + max_size;
> > > +
> > > +	alloc_size = read_end - pos;
> > > +	if (i_size > SIZE_MAX || (max_size > 0 && alloc_size > max_size)) {
> > >  		ret = -EFBIG;
> > >  		goto out;
> > 
> > ... like that.
> > 
> > I think what we actually want is:
> > 
> > ssize_t vmap_file_range(struct file *, loff_t start, loff_t end, void **bufp);
> > void vunmap_file_range(struct file *, void *buf);
> > 
> > If end > i_size, limit the allocation to i_size.  Returns the number
> > of bytes allocated, or a negative errno.  Writes the pointer allocated
> > to *bufp.  Internally, it should use the page cache to read in the pages
> > (taking appropriate reference counts).  Then it maps them using vmap()
> > instead of copying them to a private vmalloc() array.
> > 
> > kernel_read_file() can be converted to use this API.  The users will
> > need to be changed to call kernel_read_end(struct file *file, void *buf)
> > instead of vfree() so it can call allow_write_access() for them.
> > 
> > vmap_file_range() has a lot of potential uses.  I'm surprised we don't
> > have it already, to be honest.
> 
> Prior to kernel_read_file() the same or verify similar code existed in
> multiple places in the kernel.  The kernel_read_file() API
> consolidated the existing code adding the pre and post security hooks.
> 
> With this new design of not using a private vmalloc, will the file
> data be accessible prior to the post security hooks?  From an IMA
> perspective, the hooks are used for measuring and/or verifying the
> integrity of the file.

File data is already accessible prior to the post security hooks.
Look how kernel_read_file works:

        ret = deny_write_access(file);
        ret = security_kernel_read_file(file, id);
                *buf = vmalloc(i_size);
                bytes = kernel_read(file, *buf + pos, i_size - pos, &pos);
        ret = security_kernel_post_read_file(file, *buf, i_size, id);

kernel_read() will read the data into the page cache and then copy it
into the vmalloc'd buffer.  There's nothing here to prevent read accesses
to the file.