[PATCH net-next 06/10] cipso_ipv4: use iph_set_totlen in skbuff_setattr

Tue Jan 17 22:46:49 UTC 2023

On Tue, Jan 17, 2023 at 2:51 PM Paul Moore <paul at paul-moore.com> wrote:
> On Mon, Jan 16, 2023 at 2:35 PM Xin Long <lucien.xin at gmail.com> wrote:
> > On Mon, Jan 16, 2023 at 1:13 PM Paul Moore <paul at paul-moore.com> wrote:
> > > On Mon, Jan 16, 2023 at 12:37 PM Xin Long <lucien.xin at gmail.com> wrote:
> > > > On Mon, Jan 16, 2023 at 11:46 AM Paul Moore <paul at paul-moore.com> wrote:
> > > > > On Sat, Jan 14, 2023 at 12:54 PM Xin Long <lucien.xin at gmail.com> wrote:
> > > > > > On Sat, Jan 14, 2023 at 10:39 AM Paul Moore <paul at paul-moore.com> wrote:
> > > > > > > On Fri, Jan 13, 2023 at 10:31 PM Xin Long <lucien.xin at gmail.com> wrote:
>
> ...
>
> > > > > We can't skip the CIPSO labeling as that would be the network packet
> > > > > equivalent of not assigning a owner/group/mode to a file on the
> > > > > filesystem, which is a Very Bad Thing :)
> > > > >
> > > > > I spent a little bit of time this morning looking at the problem and I
> > > > > think the right approach is two-fold: first introduce a simple check
> > > > > in cipso_v4_skbuff_setattr() which returns -E2BIG if the packet length
> > > > > grows beyond 65535.  It's rather crude, but it's a tiny patch and
> > > > > should at least ensure that the upper layers (NetLabel and SELinux)
> > > > > don't send the packet with a bogus length field; it will result in
> > > > > packet drops, but honestly that seems preferable to a mangled packet
> > > > > which will likely be dropped at some point in the network anyway.
> > > > >
> > > > > diff --git a/net/ipv4/cipso_ipv4.c b/net/ipv4/cipso_ipv4.c
> > > > > index 6cd3b6c559f0..f19c9beda745 100644
> > > > > --- a/net/ipv4/cipso_ipv4.c
> > > > > +++ b/net/ipv4/cipso_ipv4.c
> > > > > @@ -2183,8 +2183,10 @@ int cipso_v4_skbuff_setattr(struct sk_buff *skb,
> > > > >         * that the security label is applied to the packet - we do the same
> > > > >         * thing when using the socket options and it hasn't caused a problem,
> > > > >         * if we need to we can always revisit this choice later */
> > > > > -
> > > > >        len_delta = opt_len - opt->optlen;
> > > > > +       if ((skb->len + len_delta) > 65535)
> > > > > +               return -E2BIG;
> > > > > +
> > > >
> > > > Right, looks crude. :-)
> > >
> > > Yes, but what else can we do?  There is fragmentation, but that is
> > > rather ugly and we would still need a solution for when the don't
> > > fragment bit is set.  I'm open to suggestions.
> >
> > looking at ovs_dp_upcall(), for GSO/GRO packets it goes to
> > queue_gso_packets() where it calls __skb_gso_segment()
> > to segment it into small segs/skbs, then process these segs instead.
> >
> > I'm thinking you can try to do the same in cipso_v4_skbuff_setattr(),
> > and I don't think 64K non-GSO packets exist in the user environment,
> > so taking care of GSO packets should be enough.
>
> Thanks, I'll take a look.

Unfortunately I don't think the ovs_dp_upcall() approach will work as
that is an endpoint in the kernel which sends the GSO'd packet up to
userspace in segements.  In the case of cipso_v4_skbuff_setattr() we
are setting an IPv4 option on a packet in either the NF_INET_LOCAL_OUT
or NF_INET_FORWARD output path.  I believe we can resolve the
LOCAL_OUT case with the padding approach I mentioned previously, but
the FORWARD path remains a challenge; I simply don't see a way to
handle growing the packet beyond 64k in the forward path.  I'm also
realizing that we should be sending a ICMP_FRAG_NEEDED in the forward
case when we have to drop the packet due to size issues, as the normal
MTU/size check happens prior to the NF_INET_FORWARD hooks (and hence
cipso_v4_skbuff_setattr()).

> > > It seems like there is still ongoing discussion about even enabling
> > > BIG TCP for IPv4, however for this discussion let's assume that BIG
> > > TCP is merged for IPv4.
> > >
> > > We really should have a solution that allows CIPSO for both normal and
> > > BIG TCP, if we don't we force distros and admins to choose between the
> > > two and that isn't good.  We should do better.  If skb->len > 64k in
> > > the case of BIG TCP, how is the packet eventually divided/fragmented
> > > in such a way that the total length field in the IPv4 header doesn't
> > > overflow?  Or is that simply handled at the driver/device layer and we
> > > simply set skb->len to whatever the size is, regardless of the 16-bit
> >
> > Yes, for BIG TCP, 16-bit length is set to 0, and it just uses skb->len
> > as the IP packet length.
>
> In the BIG TCP case, when is the IPv4 header zero'd out?  Currently
> cipso_v4_skbuff_setattr() is called in the NF_INET_LOCAL_OUT and
> NF_INET_FORWARD chains, is there an easy way to distinguish between a
> traditional segmentation offload mechanism, e.g. GSO, and BIG TCP?  If
> BIG TCP allows for arbitrarily large packets we can just grow the
> skb->len value as needed and leave the total length field in the IPv4
> header untouched/zero, but we would need to be able to distinguish
> between a segmentation offload and BIG TCP.

Keeping the above questions as they still apply, rather I could still
use some help understanding what a BIG TCP packet would look like
during LOCAL_OUT and FORWARD.

> > > In the GRO case, is it safe to grow the packet such that skb->len is
> > > greater than 64k?  I presume that the device/driver is going to split
> > > the packet anyway and populate the IPv4 total length fields in the
> > > header anyway, right?  If we can't grow the packet beyond 64k, is
> > > there some way to signal to the driver/device at runtime that the
> > > largest packet we can process is 64k minus 40 bytes (for the IPv4
> > > options)?
> >
> > at runtime, not as far as I know.
> > It's a field of the network device that can be modified by:
> > # ip link set dev eth0 gro_max_size $MAX_SIZE gso_max_size $MAX_SIZE
>
> I need to look at the OVS case above, but one possibility would be to
> have the kernel adjust the GSO size down by 40 bytes when
> CONFIG_NETLABEL is enabled, but that isn't a great option, and not
> something I consider a first (or second) choice.

Looking more at the GSO related code, this isn't likely to work.

-- 
paul-moore.com