[net-next v4 1/6] net: Documentation on QUIC kernel Tx crypto.

Adel Abouchaev adel.abushaev at gmail.com
Fri Sep 9 00:12:33 UTC 2022

Add documentation for kernel QUIC code.

Signed-off-by: Adel Abouchaev <adel.abushaev at gmail.com>


Added quic.rst reference to the index.rst file; identation in
quic.rst file.
Reported-by: kernel test robot <lkp at intel.com>

Added SPDX license GPL 2.0.
v2: Removed whitespace at EOF.
v3: Added explanation of features.
v4: Updated and formatted the doc for readability.
 Documentation/networking/index.rst |   1 +
 Documentation/networking/quic.rst  | 215 +++++++++++++++++++++++++++++
 2 files changed, 216 insertions(+)
 create mode 100644 Documentation/networking/quic.rst

diff --git a/Documentation/networking/index.rst b/Documentation/networking/index.rst
index bacadd09e570..0dacd8c8a3ff 100644
--- a/Documentation/networking/index.rst
+++ b/Documentation/networking/index.rst
@@ -89,6 +89,7 @@ Contents:
+   quic
diff --git a/Documentation/networking/quic.rst b/Documentation/networking/quic.rst
new file mode 100644
index 000000000000..48861c458381
--- /dev/null
+++ b/Documentation/networking/quic.rst
@@ -0,0 +1,215 @@
+.. SPDX-License-Identifier: GPL-2.0
+QUIC is a secure general-purpose transport protocol that creates a stateful
+interaction between a client and a server. QUIC provides end-to-end integrity
+and confidentiality. Refer to RFC 9000 [#rfc9000]_ for the standard document.
+The kernel Tx side offload covers the encryption of the application streams
+in the kernel rather than in the application. These packets are 1RTT packets
+in QUIC connection. Encryption of every other packets is still done by the
+QUIC library in userspace.
+The flow match is performed using 5 parameters: source and destination IP
+addresses, source and destination UDP ports and destination QUIC connection ID.
+Not all these parameters are always needed. The Tx direction matches the flow
+on the destination IP, port and destination connection ID; while the Rx
+direction would later match on source IP, port and destination connection ID.
+This will cover multiple scenarios where the server is using ``SO_REUSEADDR``
+and/or empty destination connection IDs or combination of these.
+The Rx direction is not implemented yet.
+The connection migration scenario is not handled by the kernel code and will
+be handled by the user space portion of QUIC library. On the Tx direction,
+the new key would be installed before a packet with an updated destination is
+sent. On the Rx direction, the behavior will be to drop a packet if a flow is
+For the key rotation, the behavior is to drop packets on Tx when the encryption
+key with matching key rotation bit is not present. On Rx direction, the packet
+will be sent to the userspace library with unencrypted header and encrypted
+payload. A separate indication will be added to the ancillary data to indicate
+the status of the operation as not matching the current key bit. It is not
+possible to use the key rotation bit as part of the key for flow lookup as that
+bit is protected by the header protection. A special provision will need to be
+done in user mode to keep attempting the payload decryption to prevent timing
+User Interface
+Creating a QUIC connection
+QUIC connection originates and terminates in the application, using one of many
+available QUIC libraries. The code instantiates the client and server in
+some form and configures them to use certain addresses and ports for the
+source and destination. The client and server negotiate the set of keys to
+protect the communication during different phases of the connection, maintain
+the connection and perform congestion control.
+Requesting to add QUIC Tx kernel encryption to the connection
+Each flow that should be encrypted by the kernel needs to be registered with
+the kernel using socket API. A ``setsockopt()`` call on the socket creates an
+association between the QUIC connection ID of the flow with the encryption
+parameters for the crypto operations:
+.. code-block:: c
+	struct quic_connection_info conn_info;
+	char conn_id[5] = {0x01, 0x02, 0x03, 0x04, 0x05};
+	const size_t conn_id_len = sizeof(conn_id);
+	char conn_key[16] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+			     0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f};
+	char conn_iv[12] = {0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+			    0x08, 0x09, 0x0a, 0x0b};
+	char conn_hdr_key[16] = {0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
+				 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f
+				};
+        conn_info.conn_payload_key_gen = 0;
+	conn_info.cipher_type = TLS_CIPHER_AES_GCM_128;
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.conn_id_length = 5;
+	memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE
+				      - conn_id_len],
+	       &conn_id, conn_id_len);
+	memcpy(&conn_info.payload_key, conn_key, sizeof(conn_key));
+	memcpy(&conn_info.payload_iv, conn_iv, sizeof(conn_iv));
+	memcpy(&conn_info.header_key, conn_hdr_key, sizeof(conn_hdr_key));
+	setsockopt(fd, SOL_UDP, UDP_QUIC_ADD_TX_CONNECTION, &conn_info,
+		   sizeof(conn_info));
+Requesting to remove QUIC Tx kernel crypto offload control messages
+All flows are removed when the socket is closed. To request an explicit remove
+of the offload for the connection during the lifetime of the socket the process
+is similar to adding the flow. Only the connection ID and its length are
+necessary to supply to remove the connection from the offload:
+.. code-block:: c
+	memset(&conn_info.key, 0, sizeof(struct quic_connection_info_key));
+	conn_info.key.conn_id_length = 5;
+	memcpy(&conn_info.key.conn_id[QUIC_MAX_CONNECTION_ID_SIZE
+				      - conn_id_len],
+	       &conn_id, conn_id_len);
+	setsockopt(fd, SOL_UDP, UDP_QUIC_DEL_TX_CONNECTION, &conn_info,
+		   sizeof(conn_info));
+Sending application data
+For Tx encryption offload, the application should use ``sendmsg()`` socket
+call and provide ancillary data with information on connection ID length and
+offload flags for the kernel to perform the encryption and GSO support if
+.. code-block:: c
+	size_t cmsg_tx_len = sizeof(struct quic_tx_ancillary_data);
+	uint8_t cmsg_buf[CMSG_SPACE(cmsg_tx_len)];
+	struct quic_tx_ancillary_data * anc_data;
+	size_t quic_data_len = 4500;
+	struct cmsghdr * cmsg_hdr;
+	char quic_data[9000];
+	struct iovec iov[2];
+	int send_len = 9000;
+	struct msghdr msg;
+	int err;
+	iov[0].iov_base = quic_data;
+	iov[0].iov_len = quic_data_len;
+	iov[1].iov_base = quic_data + 4500;
+	iov[1].iov_len = quic_data_len;
+	if (client.addr.sin_family == AF_INET) {
+		msg.msg_name = &client.addr;
+		msg.msg_namelen = sizeof(client.addr);
+	} else {
+		msg.msg_name = &client.addr6;
+		msg.msg_namelen = sizeof(client.addr6);
+	}
+	msg.msg_iov = iov;
+	msg.msg_iovlen = 2;
+	msg.msg_control = cmsg_buf;
+	msg.msg_controllen = sizeof(cmsg_buf);
+	cmsg_hdr = CMSG_FIRSTHDR(&msg);
+	cmsg_hdr->cmsg_level = IPPROTO_UDP;
+	cmsg_hdr->cmsg_type = UDP_QUIC_ENCRYPT;
+	cmsg_hdr->cmsg_len = CMSG_LEN(cmsg_tx_len);
+	anc_data = CMSG_DATA(cmsg_hdr);
+	anc_data->flags = 0;
+	anc_data->next_pkt_num = 0x0d65c9;
+	anc_data->conn_id_length = conn_id_len;
+	err = sendmsg(self->sfd, &msg, 0);
+QUIC Tx offload in kernel will read the data from userspace, encrypt and
+copy it to the ciphertext within the same operation.
+Sending QUIC application data with GSO
+When GSO is in use, the kernel will use the GSO fragment size as the target
+for ciphertext. The packets from the user space should align on the boundary
+of the fragment size minus the tag size for the chosen cipher. For example,
+if the fragment size is 1200 bytes and the tag size is 16 bytes, the plain
+packets should follow each other at every 1184 bytes. After the encryption,
+the rest of UDP and IP stacks will follow the defined value of the fragment,
+which includes the trailing tag bytes.
+To set up GSO fragmentation:
+.. code-block:: c
+	setsockopt(self->sfd, SOL_UDP, UDP_SEGMENT, &frag_size,
+		   sizeof(frag_size));
+If the fragment size is provided in ancillary data within the ``sendmsg()``
+call, the value in ancillary data will take precedence over the segment size
+provided in setsockopt to split the payload into packets. This is consistent
+with the UDP stack behavior.
+Integrating to userspace QUIC libraries
+Userspace QUIC libraries integration would depend on the implementation of the
+QUIC protocol. For MVFST library [#mvfst]_, the control plane is integrated
+into the handshake callbacks to properly configure the flows into the socket;
+and the data plane is integrated into the methods that perform encryption
+and send the packets to the batch scheduler for transmissions to the socket.
+QUIC Tx offload to the kernel has counters
+- ``QuicCurrTxSw`` -
+  number of currently active kernel offloaded QUIC connections
+- ``QuicTxSw`` -
+  accumulative total number of offloaded QUIC connections
+- ``QuicTxSwError`` -
+  accumulative total number of errors during QUIC Tx offload to kernel
+.. [#rfc9000] https://datatracker.ietf.org/doc/html/rfc9000
+.. [#mvfst] https://github.com/facebookincubator/mvfst

More information about the Linux-security-module-archive mailing list