[PATCH 05/12] libbpf: Support exclusive map creation

Fri Jun 13 16:51:32 UTC 2025

On Thu, Jun 12, 2025 at 4:42 PM KP Singh <kpsingh at kernel.org> wrote:
>
> On Fri, Jun 13, 2025 at 12:56 AM Andrii Nakryiko
> <andrii.nakryiko at gmail.com> wrote:
> >
> > On Fri, Jun 6, 2025 at 4:29 PM KP Singh <kpsingh at kernel.org> wrote:
> > >
> > > Implement a convenient method i.e. bpf_map__make_exclusive which
> > > calculates the hash for the program and registers it with the map for
> > > creation as an exclusive map when the objects are loaded.
> > >
> > > The hash of the program must be computed after all the relocations are
> > > done.
> > >
> > > Signed-off-by: KP Singh <kpsingh at kernel.org>
> > > ---
> > >  tools/lib/bpf/bpf.c            |  4 +-
> > >  tools/lib/bpf/bpf.h            |  4 +-
> > >  tools/lib/bpf/libbpf.c         | 68 +++++++++++++++++++++++++++++++++-
> > >  tools/lib/bpf/libbpf.h         | 13 +++++++
> > >  tools/lib/bpf/libbpf.map       |  5 +++
> > >  tools/lib/bpf/libbpf_version.h |  2 +-
> > >  6 files changed, 92 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
> > > index a9c3e33d0f8a..11fa2d64ccca 100644
> > > --- a/tools/lib/bpf/bpf.c
> > > +++ b/tools/lib/bpf/bpf.c
> > > @@ -172,7 +172,7 @@ int bpf_map_create(enum bpf_map_type map_type,
> > >                    __u32 max_entries,
> > >                    const struct bpf_map_create_opts *opts)
> > >  {
> > > -       const size_t attr_sz = offsetofend(union bpf_attr, map_token_fd);
> > > +       const size_t attr_sz = offsetofend(union bpf_attr, excl_prog_hash);
> > >         union bpf_attr attr;
> > >         int fd;
> > >
> > > @@ -203,6 +203,8 @@ int bpf_map_create(enum bpf_map_type map_type,
> > >         attr.map_ifindex = OPTS_GET(opts, map_ifindex, 0);
> > >
> > >         attr.map_token_fd = OPTS_GET(opts, token_fd, 0);
> > > +       attr.excl_prog_hash = ptr_to_u64(OPTS_GET(opts, excl_prog_hash, NULL));
> > > +       attr.excl_prog_hash_size = OPTS_GET(opts, excl_prog_hash_size, 0);
> > >
> > >         fd = sys_bpf_fd(BPF_MAP_CREATE, &attr, attr_sz);
> > >         return libbpf_err_errno(fd);
> > > diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
> > > index 777627d33d25..a82b79c0c349 100644
> > > --- a/tools/lib/bpf/bpf.h
> > > +++ b/tools/lib/bpf/bpf.h
> > > @@ -54,9 +54,11 @@ struct bpf_map_create_opts {
> > >         __s32 value_type_btf_obj_fd;
> > >
> > >         __u32 token_fd;
> > > +       __u32 excl_prog_hash_size;
> > > +       const void *excl_prog_hash;
> > >         size_t :0;
> > >  };
> > > -#define bpf_map_create_opts__last_field token_fd
> > > +#define bpf_map_create_opts__last_field excl_prog_hash
> > >
> > >  LIBBPF_API int bpf_map_create(enum bpf_map_type map_type,
> > >                               const char *map_name,
> > > diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
> > > index 475038d04cb4..17de756973f4 100644
> > > --- a/tools/lib/bpf/libbpf.c
> > > +++ b/tools/lib/bpf/libbpf.c
> > > @@ -499,6 +499,7 @@ struct bpf_program {
> > >         __u32 line_info_rec_size;
> > >         __u32 line_info_cnt;
> > >         __u32 prog_flags;
> > > +       __u8  hash[SHA256_DIGEST_LENGTH];
> > >  };
> > >
> > >  struct bpf_struct_ops {
> > > @@ -578,6 +579,8 @@ struct bpf_map {
> > >         bool autocreate;
> > >         bool autoattach;
> > >         __u64 map_extra;
> > > +       const void *excl_prog_sha;
> > > +       __u32 excl_prog_sha_size;
> > >  };
> > >
> > >  enum extern_type {
> > > @@ -4485,6 +4488,43 @@ bpf_object__section_to_libbpf_map_type(const struct bpf_object *obj, int shndx)
> > >         }
> > >  }
> > >
> > > +static int bpf_program__compute_hash(struct bpf_program *prog)
> > > +{
> > > +       struct bpf_insn *purged;
> > > +       bool was_ld_map;
> > > +       int i, err;
> > > +
> > > +       purged = calloc(1, BPF_INSN_SZ * prog->insns_cnt);
> > > +       if (!purged)
> > > +               return -ENOMEM;
> > > +
> > > +       /* If relocations have been done, the map_fd needs to be
> > > +        * discarded for the digest calculation.
> > > +        */
> >
> > all this looks sketchy, let's think about some more robust approach
> > here rather than randomly clearing some fields of some instructions...
>
> This is exactly what the kernel does:
>
> https://elixir.bootlin.com/linux/v6.15.1/source/kernel/bpf/core.c#L314
>
> We will need to update both, it does not clear them of instructions,
> it clears an immediate value that is the FD of the map which is
> unstable.

Looking at what libbpf is doing with relocations, we are missing the
case of src_reg == BPF_PSEUDO_BTF_ID in which we are setting
insn[1].imm to kernel module BTF FD (so unstable value as well). So I
guess we should fix kernel-side logic there as well?

But overall, it's of course funny, because for a long while we've been
saying that calculating the signature/hash of a BPF program by masking
some parts of instructions (containing FDs and addresses) is not good
and not secure. Now we are doing exactly that to "predict" and define
which BPF program has exclusivity rights. This shouldn't be a problem
for lskel as BPF program code is supposed to be stable, but it feels
weird to do it as a general case.

>
> >
> > > +       for (i = 0, was_ld_map = false; i < prog->insns_cnt; i++) {
> > > +               purged[i] = prog->insns[i];
> > > +               if (!was_ld_map &&
> > > +                   purged[i].code == (BPF_LD | BPF_IMM | BPF_DW) &&
> > > +                   (purged[i].src_reg == BPF_PSEUDO_MAP_FD ||
> > > +                    purged[i].src_reg == BPF_PSEUDO_MAP_VALUE)) {
> > > +                       was_ld_map = true;
> > > +                       purged[i].imm = 0;
> > > +               } else if (was_ld_map && purged[i].code == 0 &&
> > > +                          purged[i].dst_reg == 0 && purged[i].src_reg == 0 &&
> > > +                          purged[i].off == 0) {
> > > +                       was_ld_map = false;
> > > +                       purged[i].imm = 0;
> > > +               } else {
> > > +                       was_ld_map = false;
> > > +               }
> > > +       }
> >
> > this was_ld_map business is... unnecessary? Just access purged[i + 1]
> > (checking i + 1 < prog->insns_cnt, of course), and i += 1. This
> > stateful approach is an unnecessary complication, IMO
>
> Again, I did not do much here. Happy to make it better though.
>

I don't know why kernel code was written in this more stateful form,
but I find it much harder to follow, especially given that it's
trivial to handle two-instruction ldimm64 in one go. And both libbpf
and verifier code does handle ldimm64 as insn[0] and insn[1] parts at
the same time elsewhere (e.g., see resolve_pseudo_ldimm64()). So at
least for libbpf side, let's do a simpler implementation (I'd do it
for kernel code as well, but I'm not insisting).

[...]

> > > +       map->excl_prog_sha = prog->hash;
> > > +       map->excl_prog_sha_size = SHA256_DIGEST_LENGTH;
> >
> > this is a hack, I assume that's why you compute that hash for any
> > program all the time, right? Well, first, if this is called before
> > bpf_object_prepare(), it will silently do the wrong thing.
> >
> > But also I don't think we should calculate hash proactively, we could
> > do this lazily.
> >

So this bothered me and felt wrong. And I realized that you are doing
it at the wrong abstraction level here. Instead of trying to calculate
exclusivity hash in bpf_map__make_exclusive(), we should just record
`struct bpf_program *` pointer as an exclusive target bpf program. And
only calculate hash at map creation time where we know that the BPF
program has been relocated.

At that time we can also error out if bpf_program turned out to be
disabled to be loaded, etc.

I'd also call this a bit more like any other setter (and would add a
getter for completeness): bpf_map__set_exclusive_program() and
bpf_map__exclusive_program(). And I guess we can allow overwriting
exclusive program pointer, given this explicit setter semantics.

> > > +       return 0;
> > > +}
> > > +
> > > +
> > >  static struct bpf_map *
> > >  __bpf_map__iter(const struct bpf_map *m, const struct bpf_object *obj, int i)
> > >  {

[...]