[PATCH v2 bpf-next 0/9] add new acquire/release BPF kfuncs

Alexei Starovoitov alexei.starovoitov at gmail.com
Wed Mar 13 21:05:13 UTC 2024


On Mon, Mar 11, 2024 at 5:01 AM Christian Brauner <brauner at kernel.org> wrote:
>
> > > > One can argue that get_mm_exe_file() is not exported,
> > > > but it's nothing but rcu_lock-wrap plus get_file_rcu()
> > > > which is EXPORT_SYMBOL.
> > >
> > > Oh, good spot. That's an accident. get_file_rcu() definitely shouldn't
> > > be exported. So that'll be removed asap.
> >
> > So, just to make a point that
> > "Included in that set are functions that aren't currently even
> > exported to modules"
> > you want to un-export get_file_rcu() ?
>
> No. The reason it was exported was because of the drm subsystem and we
> already quite disliked that. But it turned out that's not needed so in
> commit 61d4fb0b349e ("file, i915: fix file reference for
> mmap_singleton()") they were moved away from this helper.

Arguably that commit 61d4fb0b349e should have had
Fixes: 0ede61d8589c ("file: convert to SLAB_TYPESAFE_BY_RCU")
i915 was buggy before you touched it
and safe_by_rcu exposed the bug.
I can see why you guys looked at it, saw issues,
and decided to look away.
Though your guess in commit 61d4fb0b349e
"
    Now, there might be delays until
    file->f_op->release::singleton_release() is called and
    i915->gem.mmap_singleton is set to NULL.
"
feels unlikely.
I suspect release() delay cannot be that long to cause rcu stall.
In the log prior to the splat there are just two mmap related calls
from selftests in i915_gem_mman_live_selftests():
i915: Running i915_gem_mman_live_selftests/igt_mmap_offset_exhaustion
i915: Running i915_gem_mman_live_selftests/igt_mmap
1st mmap test passed, but 2nd failed.
So it looks like it's not a race, but an issue with cleanup in that driver.
And instead of getting to the bottom of the issue
you've decided to paper over with get_file_active().
I agree with that trade-off.
But the bug in i915 is still there and it's probably an UAF.
get_file_active() is probably operating on a broken 'struct file'
that got to zero, but somehow it still around
or it's just a garbage memory and file->f_count
just happened to be zero.

My point is that it's not ok to have such double standards.
On one side you're arguing that we shouldn't introduce kfunc:
+__bpf_kfunc struct file *bpf_get_task_exe_file(struct task_struct *task)
+{
+ return get_task_exe_file(task);
+}
that cleanly takes ref cnt on task->mm->exe_file and _not_ using lower
level get_file/get_file_rcu/get_file_active api-s directly which
are certainly problematic to expose anywhere, since safe_by_rcu
protocol is delicate.

But on the other side there is buggy i915 that does
questionable dance with get_file_active().
It's EXPORT_SYMBOL_GPL as well and out of tree driver can
ruin safe_by_rcu file properties with hard to debug consequences.

> There is absolutely no way that any userspace will
> get access to such low-level helpers. They have zero business to be
> involved in the lifetimes of objects on this level just as no module has.

correct, and kfuncs do not give bpf prog to do direct get_file*() access
because we saw how tricky safe_by_rcu is.
Hence kfuncs acquire file via get_task_exe_file or get_mm_exe_file
and release via fput.
That's the same pattern that security/tomoyo/util.c is doing:
   exe_file = get_mm_exe_file(mm);
   if (!exe_file)
        return NULL;

   cp = tomoyo_realpath_from_path(&exe_file->f_path);
   fput(exe_file);

in bpf_lsm case it will be:

   exe_file = bpf_get_mm_exe_file(mm);
   if (!exe_file)
   // the verifier will enforce that bpf prog has this NULL check here
   // because we annotate kfunc as:
BTF_ID_FLAGS(func, bpf_get_mm_exe_file, KF_ACQUIRE | KF_TRUSTED_ARGS |
KF_RET_NULL)

 bpf_path_d_path(&exe_file->f_path, ...);
 bpf_put_file(exe_file);
// and the verifier will enforce that bpf_put_file() is called too.
// and there is no path out of this bpf program that can take file refcnt
// without releasing.

So really these kfuncs are a nop from vfs pov.
If there is a bug in the verifier we will debug it and we will fix it.

You keep saying that bpf_d_path() is a mess.
Right. It is a mess now and we're fixing it.
When it was introduced 4 years ago it was safe at that time.
The unrelated verifier "smartness" made it possible to use it in UAF.
We found the issue now and we're fixing it.
Over these years we didn't ask vfs folks to help fix such bugs,
and not asking for help now.
You're being cc-ed on the patches to be aware on how we plan to fix
this bpf_d_path() mess. If you have a viable alternative please suggest.
As it stands the new kfuncs are clean and safe way to solve this mess.



More information about the Linux-security-module-archive mailing list