[PATCH net] tcp: drop skb extensions before skb_attempt_defer_free

Wed Feb 12 16:15:41 UTC 2025

On Tue, Feb 11, 2025 at 1:51 PM Sabrina Dubroca <sd at queasysnail.net> wrote:
> 2025-02-10, 17:24:43 +0100, Eric Dumazet wrote:
> > On Mon, Feb 10, 2025 at 5:02 PM Sabrina Dubroca <sd at queasysnail.net> wrote:
> > >
> > > Xiumei reported hitting the WARN in xfrm6_tunnel_net_exit while
> > > running tests that boil down to:
> > >  - create a pair of netns
> > >  - run a basic TCP test over ipcomp6
> > >  - delete the pair of netns
> > >
> > > The xfrm_state found on spi_byaddr was not deleted at the time we
> > > delete the netns, because we still have a reference on it. This
> > > lingering reference comes from a secpath (which holds a ref on the
> > > xfrm_state), which is still attached to an skb. This skb is not
> > > leaked, it ends up on sk_receive_queue and then gets defer-free'd by
> > > skb_attempt_defer_free.
> > >
> > > The problem happens when we defer freeing an skb (push it on one CPU's
> > > defer_list), and don't flush that list before the netns is deleted. In
> > > that case, we still have a reference on the xfrm_state that we don't
> > > expect at this point.
> > >
> > > tcp_eat_recv_skb is currently the only caller of skb_attempt_defer_free,
> > > so I'm fixing it here. This patch also adds a DEBUG_NET_WARN_ON_ONCE
> > > in skb_attempt_defer_free, to make sure we don't re-introduce this
> > > problem.
> > >
> > > Fixes: 68822bdf76f1 ("net: generalize skb freeing deferral to per-cpu lists")
> > > Reported-by: Xiumei Mu <xmu at redhat.com>
> > > Signed-off-by: Sabrina Dubroca <sd at queasysnail.net>
> > > ---
> > > A few comments:
> > >  - AFAICT this could not happen before 68822bdf76f1, since we would
> > >    have emptied the (per-socket) defer_list before getting to ->exit()
> > >    for the netns
> > >  - I thought about dropping the extensions at the same time as we
> > >    already drop the dst, but Paolo said this is probably not correct due
> > >    to IP_CMSG_PASSSEC
> >
> > I think we discussed this issue in the past.
> >
> > Are you sure IP_CMSG_PASSSEC  is ever used by TCP ?
>
> After checking, I don't think so. The only way TCP can get to
> IP_CMSG_PASSSEC is through the error queue, so it shouldn't matter.
>
> The original commit (2c7946a7bf45 ("[SECURITY]: TCP/UDP getpeersec"))
> also says that TCP should be using SO_PEERSEC for that purpose
> (although likely based on the secpath as well, but not packet per
> packet).
>
> Based on the chat you had with Paul Moore back in November, it seems
> any point after tcp_filter should be fine:
> https://lore.kernel.org/netdev/CAHC9VhS3yuwrOPcH5_iRy50O_TtBCh_OVWHZgzfFTYqyfrw_zQ@mail.gmail.com

[NOTE: CC'ing the LSM so others are aware of this discussion]

I can't say I've got the full context for this discussion, but my
comments from last November should still hold true: if a packet/skb is
still "alive" and able to be accessed by a task then we need to
preserve it from a LSM perspective.

As far as IP_CMSG_PASSSEC is concerned, while I would normally expect
TCP/stream connections to use the SO_PEERSEC method to determine the
peer security information, unless there is something in the stack or
socket API that explicitly prevents the use of IP_CMSG_PASSSEC on
UDP/datagram packets I would imagine it is possible that there is an
application somewhere which makes use of it.  The LSM framework, and
all of the LSMs that implement the IP_CMSG_PASSSEC hook callback,
should work just fine regardless of if the packet is TCP or UDP.

-- 
paul-moore.com