[PATCH] xfrm: kill xfrm_dev_{state,policy}_flush_secctx_check()

Fri Feb 27 01:14:43 UTC 2026

On Fri, Feb 13, 2026 at 5:19 AM Steffen Klassert
<steffen.klassert at secunet.com> wrote:
> On Mon, Feb 09, 2026 at 11:26:14PM +0900, Tetsuo Handa wrote:
> > On 2026/02/09 20:22, Steffen Klassert wrote:
> > > On Mon, Feb 09, 2026 at 07:02:47PM +0900, Tetsuo Handa wrote:
> > >> On 2026/02/09 18:25, Steffen Klassert wrote:

...

> And here we come to the other problem I mentioned. When a LSM policy
> rejects to flush the xfrm states and policies on network namespace
> exit, we leak all the xfrm states and policies in that namespace.
> Here we have no other option, we must flush the xfrm states and
> policies regardless of any LSM policy. This can be fixed with
> something like that:
>
> diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
> index 72678053bd69..8a4b2cbba0e0 100644
> --- a/net/xfrm/xfrm_policy.c
> +++ b/net/xfrm/xfrm_policy.c
> @@ -1822,9 +1822,11 @@ int xfrm_policy_flush(struct net *net, u8 type, bool task_valid)
>
>         spin_lock_bh(&net->xfrm.xfrm_policy_lock);
>
> -       err = xfrm_policy_flush_secctx_check(net, type, task_valid);
> -       if (err)
> -               goto out;
> +       if (task_valid) {
> +               err = xfrm_policy_flush_secctx_check(net, type, task_valid);
> +               if (err)
> +                       goto out;
> +       }
>
>  again:
>         list_for_each_entry(pol, &net->xfrm.policy_all, walk.all) {
> diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
> index f2aef404b583..fd00f2d20425 100644
> --- a/net/xfrm/xfrm_state.c
> +++ b/net/xfrm/xfrm_state.c
> @@ -923,9 +923,11 @@ int xfrm_state_flush(struct net *net, u8 proto, bool task_valid)
>         int i, err = 0, cnt = 0;
>
>         spin_lock_bh(&net->xfrm.xfrm_state_lock);
> -       err = xfrm_state_flush_secctx_check(net, proto, task_valid);
> -       if (err)
> -               goto out;
> +       if (task_valid) {
> +               err = xfrm_state_flush_secctx_check(net, proto, task_valid);
> +               if (err)
> +                       goto out;
> +       }
>
>         err = -ESRCH;
>         for (i = 0; i <= net->xfrm.state_hmask; i++) {

This seems reasonable to me, and please correct me if I'm wrong, but
if the network namespace is gone at this point, we don't really have
to worry about traffic from any applications because there should no
longer be any processes in that namespace, yes?  I suppose there is
still a chance we'll see inbound traffic for endpoints in that
namespace, but I imagine the initns stack will reject it.

> > >> For example, we don't check permission for unmount when a mount is deleted
> > >> due to teardown of a mount namespace. I wonder why you want to check permission
> > >> for unregistering a net_device when triggered by a teardown path.
> > >
> > > I just try to find out what's the right thing to do here.
> > > If a policy goes away, packets that match this policy will
> > > find another path through the network stack. As best, they
> > > are dropped somewhere, but they can also leave on some other
> > > device without encryption. A LSM that implements xfrm hooks
> > > must be able to check the permission to delete the xfrm policy
> > > or state.
> >
> > Do you mean that calling xfrm_dev_down()/xfrm_dev_unregister() might
> > result in network traffic to be sent in cleartext ?
>
> Yes this can happen, but it is known. You can either install
> a global block policy with low priority or use a LSM to
> prevent this. The latter does not work unfortunately.

If I understand you correctly, the proposal below would address this
last part, yes?

> > If yes, we need to consider updating the other patch at
> > https://lkml.kernel.org/r/20260202123655.GK34749@unreal to replace
> > the NETDEV_UNREGISTER net_device with the blackhole_netdev. (That is,
> > xfrm_dev_{state,policy}_flush() does not actually delete a state/policy
> > but instead updates that state/policy to behave as a blackhole. Then,
> > we won't need to call LSM hooks because we no longer delete).
>
> I think there is a clean way to fix this. We could just unlink
> policy and state from the device. Then we could do the same as
> we do when a state becomes unavailable due to expiration. We mark
> the state as invalid with a flag. On expiration we do this with
> XFRM_STATE_EXPIRED. We can add a new flag and do the same as
> xfrm_state_check_expire() does on a hard expire. I.e. fire
> a timer that notifies the userspace key manager that this
> path is not avalable anymore and return an error. This way
> userspace is informed about that and all packets matching
> the policy are dropped.

This looks interesting.  The traffic would be dropped, and presumably
userspace could then cleanup the old state and establish any new
policy as required, yes?

> This is of course a bit more work and requires testing.

Tetsuo, Steffen, any chance one of the two of you would be able to
work on some patches for this?

-- 
paul-moore.com