[PATCH v11 bpf-next 03/17] bpf: introduce BPF token object
Andrii Nakryiko
andrii.nakryiko at gmail.com
Wed Nov 29 00:05:36 UTC 2023
On Mon, Nov 27, 2023 at 11:06 AM Andrii Nakryiko <andrii at kernel.org> wrote:
>
> Add new kind of BPF kernel object, BPF token. BPF token is meant to
> allow delegating privileged BPF functionality, like loading a BPF
> program or creating a BPF map, from privileged process to a *trusted*
> unprivileged process, all while having a good amount of control over which
> privileged operations could be performed using provided BPF token.
>
> This is achieved through mounting BPF FS instance with extra delegation
> mount options, which determine what operations are delegatable, and also
> constraining it to the owning user namespace (as mentioned in the
> previous patch).
>
> BPF token itself is just a derivative from BPF FS and can be created
> through a new bpf() syscall command, BPF_TOKEN_CREATE, which accepts BPF
> FS FD, which can be attained through open() API by opening BPF FS mount
> point. Currently, BPF token "inherits" delegated command, map types,
> prog type, and attach type bit sets from BPF FS as is. In the future,
> having an BPF token as a separate object with its own FD, we can allow
> to further restrict BPF token's allowable set of things either at the
> creation time or after the fact, allowing the process to guard itself
> further from unintentionally trying to load undesired kind of BPF
> programs. But for now we keep things simple and just copy bit sets as is.
>
> When BPF token is created from BPF FS mount, we take reference to the
> BPF super block's owning user namespace, and then use that namespace for
> checking all the {CAP_BPF, CAP_PERFMON, CAP_NET_ADMIN, CAP_SYS_ADMIN}
> capabilities that are normally only checked against init userns (using
> capable()), but now we check them using ns_capable() instead (if BPF
> token is provided). See bpf_token_capable() for details.
>
> Such setup means that BPF token in itself is not sufficient to grant BPF
> functionality. User namespaced process has to *also* have necessary
> combination of capabilities inside that user namespace. So while
> previously CAP_BPF was useless when granted within user namespace, now
> it gains a meaning and allows container managers and sys admins to have
> a flexible control over which processes can and need to use BPF
> functionality within the user namespace (i.e., container in practice).
> And BPF FS delegation mount options and derived BPF tokens serve as
> a per-container "flag" to grant overall ability to use bpf() (plus further
> restrict on which parts of bpf() syscalls are treated as namespaced).
>
> Note also, BPF_TOKEN_CREATE command itself requires ns_capable(CAP_BPF)
> within the BPF FS owning user namespace, rounding up the ns_capable()
> story of BPF token.
>
> Signed-off-by: Andrii Nakryiko <andrii at kernel.org>
> ---
> include/linux/bpf.h | 41 +++++++
> include/uapi/linux/bpf.h | 37 ++++++
> kernel/bpf/Makefile | 2 +-
> kernel/bpf/inode.c | 17 ++-
> kernel/bpf/syscall.c | 17 +++
> kernel/bpf/token.c | 209 +++++++++++++++++++++++++++++++++
> tools/include/uapi/linux/bpf.h | 37 ++++++
> 7 files changed, 350 insertions(+), 10 deletions(-)
> create mode 100644 kernel/bpf/token.c
>
[...]
> +int bpf_token_create(union bpf_attr *attr)
> +{
> + struct bpf_mount_opts *mnt_opts;
> + struct bpf_token *token = NULL;
> + struct user_namespace *userns;
> + struct inode *inode;
> + struct file *file;
> + struct path path;
> + struct fd f;
> + umode_t mode;
> + int err, fd;
> +
> + f = fdget(attr->token_create.bpffs_fd);
> + if (!f.file)
> + return -EBADF;
> +
> + path = f.file->f_path;
> + path_get(&path);
> + fdput(f);
> +
> + if (path.dentry != path.mnt->mnt_sb->s_root) {
> + err = -EINVAL;
> + goto out_path;
> + }
> + if (path.mnt->mnt_sb->s_op != &bpf_super_ops) {
> + err = -EINVAL;
> + goto out_path;
> + }
> + err = path_permission(&path, MAY_ACCESS);
> + if (err)
> + goto out_path;
> +
> + userns = path.dentry->d_sb->s_user_ns;
> + /*
> + * Enforce that creators of BPF tokens are in the same user
> + * namespace as the BPF FS instance. This makes reasoning about
> + * permissions a lot easier and we can always relax this later.
> + */
> + if (current_user_ns() != userns) {
> + err = -EPERM;
> + goto out_path;
> + }
Hey Christian,
I've added stricter userns check as discussed on previous revision,
and a few lines above fixed BPF FS root check (path.dentry !=
path.mnt->mnt_sb->s_root). Hopefully that addresses the remaining
concerns you've had.
I'd appreciate it if you could take another look to double check if
I'm not messing anything up, and if it all looks good, can I please
get an ack from you? Thank you!
> + if (!ns_capable(userns, CAP_BPF)) {
> + err = -EPERM;
> + goto out_path;
> + }
> +
> + mode = S_IFREG | ((S_IRUSR | S_IWUSR) & ~current_umask());
> + inode = bpf_get_inode(path.mnt->mnt_sb, NULL, mode);
> + if (IS_ERR(inode)) {
> + err = PTR_ERR(inode);
> + goto out_path;
> + }
> +
[...]
More information about the Linux-security-module-archive
mailing list