[PATCH] userfaultfd, capability: introduce CAP_USERFAULTFD

Fri Feb 25 18:17:06 UTC 2022

Thanks for the detailed explanation Casey!

On Thu, Feb 24, 2022 at 6:58 PM Peter Xu <peterx at redhat.com> wrote:
>
> On Thu, Feb 24, 2022 at 04:39:44PM -0800, Casey Schaufler wrote:
> > What I'd want to see is multiple users where the use of CAP_USERFAULTD
> > is independent of the use of CAP_SYS_PTRACE. That is, the programs would
> > never require CAP_SYS_PTRACE. There should be demonstrated real value.
> > Not just that a compromised program with CAP_SYS_PTRACE can do bad things,
> > but that the programs with CAP_USERFAULTDD are somehow susceptible to
> > being exploited to doing those bad things. Hypothetical users are just
> > that, and often don't materialize.
>
> I kind of have the same question indeed..
>
> The use case we're talking about is VM migration, and the in-question
> subject is literally the migration process or thread.  Isn't that a trusted
> piece of software already?
>
> Then the question is why the extra capability (in CAP_PTRACE but not in
> CAP_UFFD) could bring much risk to the system.  Axel, did I miss something
> important?

For me it's just a matter of giving the live migration process as
little power as I can while still letting it do its job.

Live migration is somewhat trusted, and certainly if it can mess with
the memory contents of its own VM, that's no concern. But there are
other processes or threads running alongside it to manage other parts
of the VM, like attached virtual disks. Also it's probably running on
a server which also hosts other VMs, and I think it's a common design
to have them all run as the same user (although, they may be running
in other containers).

So, it seems unfortunate to me that the live migration process can
just ptrace() any of these other things running alongside it.

Casey is right that we can restrict what it can do with e.g. SELinux
or seccomp-ebpf or whatever else. But it seems to me a more fragile
design to give the permissions and then restrict them, vs. just never
giving those permissions in the first place.

In any case though, it sounds like folks are more amenable to the
device node approach. Honestly, I got that impression from Andrea as
well when we first talked about this some months ago. So, I can pursue
that approach instead.

>
> Thanks,

>
> --
> Peter Xu
>