[PATCH] xfrm: kill xfrm_dev_{state,policy}_flush_secctx_check()

Fri Feb 13 10:19:35 UTC 2026

On Mon, Feb 09, 2026 at 11:26:14PM +0900, Tetsuo Handa wrote:
> On 2026/02/09 20:22, Steffen Klassert wrote:
> > On Mon, Feb 09, 2026 at 07:02:47PM +0900, Tetsuo Handa wrote:
> >> On 2026/02/09 18:25, Steffen Klassert wrote:
> >>> The problem is that, with adding IPsec offloads to netdevices, security
> >>> critical resources came into the netdevices. Someone who has no
> >>> capabilities to delete xfrm states or xfrm policies should not be able
> >>> to unregister the netdevice if xfrm states or xfrm policies are
> >>> offloaded. Unfortunately, unregistering can't be canceled at this stage
> >>> anymore. So I think we need some netdevice unregistration hook for
> >>> the LSM subsystem so it can check for xfrm states or xfrm policies
> >>> and refuse the unregistration before we actually start to remove
> >>> the device.
> >>
> >> Unfortunately, unregistering is not always triggered by a user's request. ;-)
> > 
> > As far as I remember, a security context is not always tied to a
> > user request. It can also be attached to system tasks or objects.
> 
> That is not what I wanted to say. There are at least three routes (listed below)
> that can trigger xfrm_dev_unregister() path. You could insert LSM hooks into the
> netlink_sendmsg() route and the del_device_store() route, but the cleanup_net()
> route is a result of tear-down action which is too late to insert LSM hooks.

Yes, I know that.

> The NETDEV_UNREGISTER path can be triggered by just doing "unshare -n ip addr show"
> (i.e. implicit cleanup of a network namespace due to termination of init process in
> that namespace). We are not allowed to reject the cleanup_net() route.

And here we come to the other problem I mentioned. When a LSM policy
rejects to flush the xfrm states and policies on network namespace
exit, we leak all the xfrm states and policies in that namespace.
Here we have no other option, we must flush the xfrm states and
policies regardless of any LSM policy. This can be fixed with
something like that:

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 72678053bd69..8a4b2cbba0e0 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -1822,9 +1822,11 @@ int xfrm_policy_flush(struct net *net, u8 type, bool task_valid)
 
 	spin_lock_bh(&net->xfrm.xfrm_policy_lock);
 
-	err = xfrm_policy_flush_secctx_check(net, type, task_valid);
-	if (err)
-		goto out;
+	if (task_valid) {
+		err = xfrm_policy_flush_secctx_check(net, type, task_valid);
+		if (err)
+			goto out;
+	}
 
 again:
 	list_for_each_entry(pol, &net->xfrm.policy_all, walk.all) {
diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c
index f2aef404b583..fd00f2d20425 100644
--- a/net/xfrm/xfrm_state.c
+++ b/net/xfrm/xfrm_state.c
@@ -923,9 +923,11 @@ int xfrm_state_flush(struct net *net, u8 proto, bool task_valid)
 	int i, err = 0, cnt = 0;
 
 	spin_lock_bh(&net->xfrm.xfrm_state_lock);
-	err = xfrm_state_flush_secctx_check(net, proto, task_valid);
-	if (err)
-		goto out;
+	if (task_valid) {
+		err = xfrm_state_flush_secctx_check(net, proto, task_valid);
+		if (err)
+			goto out;
+	}
 
 	err = -ESRCH;
 	for (i = 0; i <= net->xfrm.state_hmask; i++) {

> > 
> >> For example, we don't check permission for unmount when a mount is deleted
> >> due to teardown of a mount namespace. I wonder why you want to check permission
> >> for unregistering a net_device when triggered by a teardown path.
> > 
> > I just try to find out what's the right thing to do here.
> > If a policy goes away, packets that match this policy will
> > find another path through the network stack. As best, they
> > are dropped somewhere, but they can also leave on some other
> > device without encryption. A LSM that implements xfrm hooks
> > must be able to check the permission to delete the xfrm policy
> > or state.
> 
> Do you mean that calling xfrm_dev_down()/xfrm_dev_unregister() might
> result in network traffic to be sent in cleartext ?

Yes this can happen, but it is known. You can either install
a global block policy with low priority or use a LSM to
prevent this. The latter does not work unfortunately.

> 
> If yes, we need to consider updating the other patch at
> https://lkml.kernel.org/r/20260202123655.GK34749@unreal to replace
> the NETDEV_UNREGISTER net_device with the blackhole_netdev. (That is,
> xfrm_dev_{state,policy}_flush() does not actually delete a state/policy
> but instead updates that state/policy to behave as a blackhole. Then,
> we won't need to call LSM hooks because we no longer delete).

I think there is a clean way to fix this. We could just unlink
policy and state from the device. Then we could do the same as
we do when a state becomes unavailable due to expiration. We mark
the state as invalid with a flag. On expiration we do this with
XFRM_STATE_EXPIRED. We can add a new flag and do the same as
xfrm_state_check_expire() does on a hard expire. I.e. fire
a timer that notifies the userspace key manager that this
path is not avalable anymore and return an error. This way
userspace is informed about that and all packets matching
the policy are dropped.

This is of course a bit more work and requires testing.