[PATCH v5 bpf-next 2/3] bpf: implement CAP_BPF
Alexei Starovoitov
alexei.starovoitov at gmail.com
Tue May 12 18:25:15 UTC 2020
On Tue, May 12, 2020 at 04:35:41PM +0200, Daniel Borkmann wrote:
> On 5/8/20 11:53 PM, Alexei Starovoitov wrote:
> > From: Alexei Starovoitov <ast at kernel.org>
> >
> > Implement permissions as stated in uapi/linux/capability.h
> > In order to do that the verifier allow_ptr_leaks flag is split
> > into allow_ptr_leaks and bpf_capable flags and they are set as:
> > env->allow_ptr_leaks = perfmon_capable();
> > env->bpf_capable = bpf_capable();
> >
> > bpf_capable enables bounded loops, variable stack access and other verifier features.
> > allow_ptr_leaks enable ptr leaks, ptr conversions, subtraction of pointers, etc.
> > It also disables side channel mitigations.
> >
> > That means that the networking BPF program loaded with CAP_BPF + CAP_NET_ADMIN will
> > have speculative checks done by the verifier and other spectre mitigation applied.
> > Such networking BPF program will not be able to leak kernel pointers.
>
> I don't quite follow this part in the code below yet, see my comments.
>
> > Signed-off-by: Alexei Starovoitov <ast at kernel.org>
> [...]
> > diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
> > index 6abd5a778fcd..c32a7880fa62 100644
> > --- a/include/linux/bpf_verifier.h
> > +++ b/include/linux/bpf_verifier.h
> > @@ -375,6 +375,7 @@ struct bpf_verifier_env {
> > u32 used_map_cnt; /* number of used maps */
> > u32 id_gen; /* used to generate unique reg IDs */
> > bool allow_ptr_leaks;
> > + bool bpf_capable;
> > bool seen_direct_write;
> > struct bpf_insn_aux_data *insn_aux_data; /* array of per-insn state */
> > const struct bpf_line_info *prev_linfo;
> > diff --git a/kernel/bpf/arraymap.c b/kernel/bpf/arraymap.c
> > index 95d77770353c..264a9254dc39 100644
> > --- a/kernel/bpf/arraymap.c
> > +++ b/kernel/bpf/arraymap.c
> > @@ -77,7 +77,7 @@ static struct bpf_map *array_map_alloc(union bpf_attr *attr)
> > bool percpu = attr->map_type == BPF_MAP_TYPE_PERCPU_ARRAY;
> > int ret, numa_node = bpf_map_attr_numa_node(attr);
> > u32 elem_size, index_mask, max_entries;
> > - bool unpriv = !capable(CAP_SYS_ADMIN);
> > + bool unpriv = !bpf_capable();
>
> So here progs loaded with CAP_BPF will have spectre mitigations bypassed which
> is the opposite of above statement, no?
right. good catch, but now I'm not sure it was such a good call to toss
spectre into cap_perfmon. It probably should be disabled under cap_bpf.
> > u64 cost, array_size, mask64;
> > struct bpf_map_memory mem;
> > struct bpf_array *array;
> [...]
> > diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> > index 6aa11de67315..8f421dd0c4cf 100644
> > --- a/kernel/bpf/core.c
> > +++ b/kernel/bpf/core.c
> > @@ -646,7 +646,7 @@ static bool bpf_prog_kallsyms_verify_off(const struct bpf_prog *fp)
> > void bpf_prog_kallsyms_add(struct bpf_prog *fp)
> > {
> > if (!bpf_prog_kallsyms_candidate(fp) ||
> > - !capable(CAP_SYS_ADMIN))
> > + !bpf_capable())
> > return;
> > bpf_prog_ksym_set_addr(fp);
> > @@ -824,7 +824,7 @@ static int bpf_jit_charge_modmem(u32 pages)
> > {
> > if (atomic_long_add_return(pages, &bpf_jit_current) >
> > (bpf_jit_limit >> PAGE_SHIFT)) {
> > - if (!capable(CAP_SYS_ADMIN)) {
> > + if (!bpf_capable()) {
>
> Should there still be an upper charge on module mem for !CAP_SYS_ADMIN?
hmm. cap_bpf is a subset of cap_sys_admin. I don't see a reason
to keep requiring cap_sys_admin here.
>
> > atomic_long_sub(pages, &bpf_jit_current);
> > return -EPERM;
> > }
> [...]
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 70ad009577f8..a6893746cd87 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> [...]
> > @@ -3428,7 +3429,7 @@ static int check_stack_boundary(struct bpf_verifier_env *env, int regno,
> > * Spectre masking for stack ALU.
> > * See also retrieve_ptr_limit().
> > */
> > - if (!env->allow_ptr_leaks) {
> > + if (!env->bpf_capable) {
>
> This needs to stay on env->allow_ptr_leaks, the can_skip_alu_sanitation() does
> check on env->allow_ptr_leaks as well, otherwise this breaks spectre mitgation
> when masking alu.
The patch kept it in can_skip_alu_sanitation(), but I missed it here.
Don't really recall the details of discussion around
commit 088ec26d9c2d ("bpf: Reject indirect var_off stack access in unpriv mode")
So thinking all over this bit will effectively disable variable
stack access which is one of main usability features.
So for v6 I'm thinking to put spectre bypass into cap_bpf.
allow_ptr_leak will mean only what the name says: pointer leaks only.
cap_bpf should not be given to user processes that want to become root
via spectre side channels.
I think it's a usability trade-off for cap_bpf.
Without indirect var under cap_bpf too many networking progs will be forced to use
cap_bpf+net_net_admin+cap_perfmon only to pass the verifier
while they don't really care about reading arbitrary memory via cap_perfmon.
More information about the Linux-security-module-archive
mailing list