Why add the general notification queue and its sources

Fri Sep 6 16:12:13 UTC 2019

Hi,

On 06/09/2019 16:53, Linus Torvalds wrote:
> On Fri, Sep 6, 2019 at 8:35 AM Linus Torvalds
> <torvalds at linux-foundation.org> wrote:
>> This is why I like pipes. You can use them today. They are simple, and
>> extensible, and you don't need to come up with a new subsystem and
>> some untested ad-hoc thing that nobody has actually used.
> The only _real_ complexity is to make sure that events are reliably parseable.
>
> That's where you really want to use the Linux-only "packet pipe"
> thing, becasue otherwise you have to have size markers or other things
> to delineate events. But if you do that, then it really becomes
> trivial.
>
> And I checked, we made it available to user space, even if the
> original reason for that code was kernel-only autofs use: you just
> need to make the pipe be O_DIRECT.
>
> This overly stupid program shows off the feature:
>
>          #define _GNU_SOURCE
>          #include <fcntl.h>
>          #include <unistd.h>
>
>          int main(int argc, char **argv)
>          {
>                  int fd[2];
>                  char buf[10];
>
>                  pipe2(fd, O_DIRECT | O_NONBLOCK);
>                  write(fd[1], "hello", 5);
>                  write(fd[1], "hi", 2);
>                  read(fd[0], buf, sizeof(buf));
>                  read(fd[0], buf, sizeof(buf));
>                  return 0;
>          }
>
> and it you strace it (because I was too lazy to add error handling or
> printing of results), you'll see
>
>      write(4, "hello", 5)                    = 5
>      write(4, "hi", 2)                       = 2
>      read(3, "hello", 10)                    = 5
>      read(3, "hi", 10)                       = 2
>
> note how you got packets of data on the reader side, instead of
> getting the traditional "just buffer it as a stream".
>
> So now you can even have multiple readers of the same event pipe, and
> packetization is obvious and trivial. Of course, I'm not sure why
> you'd want to have multiple readers, and you'd lose _ordering_, but if
> all events are independent, this _might_ be a useful thing in a
> threaded environment. Maybe.
>
> (Side note: a zero-sized write will not cause a zero-sized packet. It
> will just be dropped).
>
>                 Linus

The events are generally not independent - we would need ordering either 
implicit in the protocol or explicit in the messages. We also need to 
know in case messages are dropped too - doesn't need to be anything 
fancy, just some idea that since we last did a read, there are messages 
that got lost, most likely due to buffer overrun.

That is why the initial idea was to use netlink, since it solves a lot 
of those issues. The downside was that the indirect nature of the 
netlink sockets resulted in making it tricky to know the namespace of 
the process to which the message was to be delivered (and hence whether 
it should be delivered at all),

Steve.