[RFC PATCH 04/14] pipe: Add O_NOTIFICATION_PIPE [ver #2]

David Howells dhowells at redhat.com
Thu Nov 7 18:48:37 UTC 2019


Andy Lutomirski <luto at kernel.org> wrote:

> > Add an O_NOTIFICATION_PIPE flag that can be passed to pipe2() to indicate
> > that the pipe being created is going to be used for notifications.  This
> > suppresses the use of splice(), vmsplice(), tee() and sendfile() on the
> > pipe as calling iov_iter_revert() on a pipe when a kernel notification
> > message has been inserted into the middle of a multi-buffer splice will be
> > messy.
>
> How messy?

Well, iov_iter_revert() on a pipe iterator simply walks backwards along the
ring discarding the last N contiguous slots (where N is normally the number of
slots that were filled by whatever operation is being reverted).

However, unless the code that transfers stuff into the pipe takes the spinlock
spinlock and disables softirqs for the duration of its ring filling, what were
N contiguous slots may now have kernel notifications interspersed - even if it
has been holding the pipe mutex.

So, now what do you do?  You have to free up just the buffers relevant to the
iterator and then you can either compact down the ring to free up the space or
you can leave null slots and let the read side clean them up, thereby
reducing the capacity of the pipe temporarily.

Either way, iov_iter_revert() gets more complex and has to hold the spinlock.

And if you don't take the spinlock whilst you're reverting, more notifications
can come in to make your life more interesting.

There's also a problem with splicing out from a notification pipe that the
messages are scribed onto preallocated buffers, but now the buffers need
refcounts and, in any case, are of limited quantity.

> And is there some way to make it impossible for this to happen?

Yes.  That's what I'm doing by declaring the pipe to be unspliceable up front.

> Adding a new flag to pipe2() to avoid messy kernel code seems
> like a poor tradeoff.

By far the easiest place to check whether a pipe can be spliced to is in
get_pipe_info().  That's checking the file anyway.  After that, you can't make
the check until the pipe is locked.

Furthermore, if it's not done upfront, the change to the pipe might happen
during a splicing operation that's residing in pipe_wait()... which drops the
pipe mutex.

David




More information about the Linux-security-module-archive mailing list