[PATCH v9 bpf-next 03/17] bpf: introduce BPF token object
Andrii Nakryiko
andrii.nakryiko at gmail.com
Wed Nov 8 21:09:35 UTC 2023
On Wed, Nov 8, 2023 at 6:28 AM Christian Brauner <brauner at kernel.org> wrote:
>
> On Fri, Nov 03, 2023 at 12:05:09PM -0700, Andrii Nakryiko wrote:
> > Add new kind of BPF kernel object, BPF token. BPF token is meant to
> > allow delegating privileged BPF functionality, like loading a BPF
> > program or creating a BPF map, from privileged process to a *trusted*
> > unprivileged process, all while have a good amount of control over which
> > privileged operations could be performed using provided BPF token.
> >
> > This is achieved through mounting BPF FS instance with extra delegation
> > mount options, which determine what operations are delegatable, and also
> > constraining it to the owning user namespace (as mentioned in the
> > previous patch).
> >
> > BPF token itself is just a derivative from BPF FS and can be created
> > through a new bpf() syscall command, BPF_TOKEN_CREATE, which accepts
> > a path specification (using the usual fd + string path combo) to a BPF
> > FS mount. Currently, BPF token "inherits" delegated command, map types,
> > prog type, and attach type bit sets from BPF FS as is. In the future,
> > having an BPF token as a separate object with its own FD, we can allow
> > to further restrict BPF token's allowable set of things either at the creation
> > time or after the fact, allowing the process to guard itself further
> > from, e.g., unintentionally trying to load undesired kind of BPF
> > programs. But for now we keep things simple and just copy bit sets as is.
> >
> > When BPF token is created from BPF FS mount, we take reference to the
> > BPF super block's owning user namespace, and then use that namespace for
> > checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
> > capabilities that are normally only checked against init userns (using
> > capable()), but now we check them using ns_capable() instead (if BPF
> > token is provided). See bpf_token_capable() for details.
> >
> > Such setup means that BPF token in itself is not sufficient to grant BPF
> > functionality. User namespaced process has to *also* have necessary
> > combination of capabilities inside that user namespace. So while
> > previously CAP_BPF was useless when granted within user namespace, now
> > it gains a meaning and allows container managers and sys admins to have
> > a flexible control over which processes can and need to use BPF
> > functionality within the user namespace (i.e., container in practice).
> > And BPF FS delegation mount options and derived BPF tokens serve as
> > a per-container "flag" to grant overall ability to use bpf() (plus further
> > restrict on which parts of bpf() syscalls are treated as namespaced).
> >
> > Note also, BPF_TOKEN_CREATE command itself requires ns_capable(CAP_BPF)
> > within the BPF FS owning user namespace, rounding up the ns_capable()
> > story of BPF token.
> >
> > The alternative to creating BPF token object was:
> > a) not having any extra object and just pasing BPF FS path to each
> > relevant bpf() command. This seems suboptimal as it's racy (mount
> > under the same path might change in between checking it and using it
> > for bpf() command). And also less flexible if we'd like to further
>
> I don't understand "mount under the same path might change in between
> checking it and using it for bpf() command".
>
> Just require userspace to open() the bpffs instance and pass that fd to
> bpf() just as you're doing right now. If that is racy then the current
> implementation is even more so because it is passing:
>
> bpffs_path_fd
> bpffs_pathname
>
> and then performs a lookup. More on that below.
Yes, this is a result of my initial confusion with how O_PATH-based
open() works. You are right that it's not racy, I'll update the
message.
>
> I want to point out that most of this code here is unnecessary if you
> use the bpffs fd itself as a token. But that's your decision. I'm just
> saying that I'm not sure the critique that it's racy is valid.
Ack.
>
> > restrict ourselves compared to all the delegated functionality
> > allowed on BPF FS.
> > b) use non-bpf() interface, e.g., ioctl(), but otherwise also create
> > a dedicated FD that would represent a token-like functionality. This
> > doesn't seem superior to having a proper bpf() command, so
> > BPF_TOKEN_CREATE was chosen.
> >
> > Signed-off-by: Andrii Nakryiko <andrii at kernel.org>
> > ---
> > include/linux/bpf.h | 41 +++++++
> > include/uapi/linux/bpf.h | 39 +++++++
> > kernel/bpf/Makefile | 2 +-
> > kernel/bpf/inode.c | 17 ++-
> > kernel/bpf/syscall.c | 17 +++
> > kernel/bpf/token.c | 197 +++++++++++++++++++++++++++++++++
> > tools/include/uapi/linux/bpf.h | 39 +++++++
> > 7 files changed, 342 insertions(+), 10 deletions(-)
> > create mode 100644 kernel/bpf/token.c
> >
[...]
> > +
> > +#define BPF_TOKEN_INODE_NAME "bpf-token"
> > +
> > +static const struct inode_operations bpf_token_iops = { };
> > +
> > +static const struct file_operations bpf_token_fops = {
> > + .release = bpf_token_release,
> > + .show_fdinfo = bpf_token_show_fdinfo,
> > +};
> > +
> > +int bpf_token_create(union bpf_attr *attr)
> > +{
> > + struct bpf_mount_opts *mnt_opts;
> > + struct bpf_token *token = NULL;
> > + struct user_namespace *userns;
> > + struct inode *inode;
> > + struct file *file;
> > + struct path path;
> > + umode_t mode;
> > + int err, fd;
> > +
> > + err = user_path_at(attr->token_create.bpffs_path_fd,
> > + u64_to_user_ptr(attr->token_create.bpffs_pathname),
> > + LOOKUP_FOLLOW | LOOKUP_EMPTY, &path);
>
> Do you really need bpffs_path_fd and bpffs_pathname?
> This seems unnecessar as you're forcing a lookup that's best done in
> userspace through regular open() apis. So I would just make this:
>
> struct { /* struct used by BPF_TOKEN_CREATE command */
> __u32 flags;
> __u32 bpffs_path_fd;
> } token_create;
>
> In bpf_token_create() you can then just do:
>
> struct fd f;
> struct path path;
>
> f = fdget(attr->token_create.bpffs_path_fd);
> if (!f.file)
> return -EBADF;
>
> *path = f.file->f_path;
> path_get(path);
> fdput(f);
>
Yes, you are right. I'll simplify this part, thanks.
> > + if (err)
> > + return err;
> > +
> > + if (path.mnt->mnt_root != path.dentry) {
> > + err = -EINVAL;
> > + goto out_path;
> > + }
> > + if (path.mnt->mnt_sb->s_op != &bpf_super_ops) {
> > + err = -EINVAL;
> > + goto out_path;
> > + }
[...]
More information about the Linux-security-module-archive
mailing list