[RFC PATCH 5/9] Define user structure for events and responses.

Tue Mar 11 19:29:03 UTC 2025

On Mon, Mar 10, 2025 at 12:39:08AM +0000, Tingmao Wang wrote:
> On 3/8/25 19:07, Mickaël Salaün wrote:
> > On Thu, Mar 06, 2025 at 03:05:10AM +0000, Tingmao Wang wrote:
> > > On 3/4/25 19:49, Mickaël Salaün wrote:
> > > > On Tue, Mar 04, 2025 at 01:13:01AM +0000, Tingmao Wang wrote:
> > > [...]
> > > > > +	/**
> > > > > +	 * @cookie: Opaque identifier to be included in the response.
> > > > > +	 */
> > > > > +	__u32 cookie;
> > > > 
> > > > I guess we could use a __u64 index counter per layer instead.  That
> > > > would also help to order requests if they are treated by different
> > > > supervisor threads.
> > > 
> > > I don't immediately see a use for ordering requests (if we get more than one
> > > event at once, they are coming from different threads anyway so there can't
> > > be any dependencies between them, and the supervisor threads can use
> > > timestamps), but I think making it a __u64 is probably a good idea
> > > regardless, as it means we don't have to do some sort of ID allocation, and
> > > can just increment an atomic.
> > 
> > Indeed, we should follow the seccomp unotify approach with a random u64
> > incremented per request.
> 
> Do you mean a random starting value, incremented by one per request, or

Yes

> something like the landlock_id in the audit patch (random increments too)?

There is no need for that because the supervisor is more privileged than
the sandbox.

> 
> > 
> > > 
> > > > > +};
> > > > > +
> > > > > +struct landlock_supervise_event {
> > > > > +	struct landlock_supervise_event_hdr hdr;
> > > > > +	__u64 access_request;
> > > > > +	__kernel_pid_t accessor;
> > > > > +	union {
> > > > > +		struct {
> > > > > +			/**
> > > > > +			 * @fd1: An open file descriptor for the file (open,
> > > > > +			 * delete, execute, link, readdir, rename, truncate),
> > > > > +			 * or the parent directory (for create operations
> > > > > +			 * targeting its child) being accessed.  Must be
> > > > > +			 * closed by the reader.
> > > > > +			 *
> > > > > +			 * If this points to a parent directory, @destname
> > > > > +			 * will contain the target filename. If @destname is
> > > > > +			 * empty, this points to the target file.
> > > > > +			 */
> > > > > +			int fd1;
> > > > > +			/**
> > > > > +			 * @fd2: For link or rename requests, a second file
> > > > > +			 * descriptor for the target parent directory.  Must
> > > > > +			 * be closed by the reader.  @destname contains the
> > > > > +			 * destination filename.  This field is -1 if not
> > > > > +			 * used.
> > > > > +			 */
> > > > > +			int fd2;
> > > > 
> > > > Can we just use one FD but identify the requested access instead and
> > > > send one event for each, like for the audit patch series?
> > > 
> > > I haven't managed to read or test out the audit patch yet (I will do), but I
> > > think having the ability to specifically tell whether the child is trying to
> > > move / rename / create a hard link of an existing file, and what it's trying
> > > to use as destination, might be useful (either for security, or purely for
> > > UX)?
> > > 
> > > For example, imagine something trying to link or move ~/.ssh/id_ecdsa to
> > > /tmp/innocent-tmp-file then read the latter. The supervisor can warn the
> > > user on the initial link attempt, and the shenanigan will probably be
> > > stopped there (although still, being able to say "[program] wants to link
> > > ~/.ssh/id_ecdsa to /tmp/innocent-tmp-file" seems better than just "[program]
> > > wants to create a link for ~/.ssh/id_ecdsa"), but even if somehow this ends
> > > up allowed, later on for the read request it could say something like
> > > 
> > > 	[program] wants to read /tmp/innocent-tmp-file
> > > 	    (previously moved from ~/.ssh/id_ecdsa)
> > > 
> > > Maybe this is a bit silly, but there might be other use cases for knowing
> > > the exact details of a rename/link request, either for at-the-time decision
> > > making, or tracking stuff for future requests?
> > 
> > This pattern looks like datagram packets.  I think we should use the
> > netlink attributes.  There were concern about using a netlink socket for
> > the seccomp unotification though:
> > https://lore.kernel.org/all/CALCETrXeZZfVzXh7SwKhyB=+ySDk5fhrrdrXrcABsQ=JpQT7Tg@mail.gmail.com/
> > 
> > There are two main differences with seccomp unotify:
> > - the supervisor should be able to receive arbitrary-sized data (e.g.
> >    file name, not path);
> > - the supervisor should be able to receive file descriptors (instead of
> >    path).
> > 
> > Sockets are created with socket(2) whereas in our case we should only
> > get a supervisor FD (indirectly) through landlock_restrict_self(2),
> > which clearly identifies a kernel object.  Another issue would be to
> > deal with network namespaces, probably by creating a private one.
> > Sockets are powerful but we don't needs all the routing complexity.
> > Moreover, we should only need a blocking communication channel to avoid
> > issues managing in-flight object references (transformed to FDs when
> > received).  That makes me think that a socket might not be the right
> > construct, but we can still rely on the NLA macros to define a proper
> > protocol with dynamically-sized events, received and send with dedicated
> > IOCTL commands.
> > 
> > Netlink already provides a way to send a cookie, and
> > netlink_attribute_type defines the types we'll need, including string.
> > 
> > For instance, a link request/event could include 3 packets, one for each
> > of these properties:
> > 1. the source file FD;
> > 2. the destination directory FD;
> > 3. the destination filename string.
> > 
> > This way we would avoid the union defined in this patch.
> 
> I had no idea about netlink - I will take a look.  Do you know if there is
> any existing code which uses it in a similar way (i.e. not creating an
> actual socket, but using netlink messages)?

I don't know.

> 
> I think in the end seccomp-unotify went with an ioctl with a custom struct
> seccomp_notif due to friction with the NL API [1] - do you think we will
> face the same problem here? (I will take a deeper look at netlink after
> sending this.)
> 
> (Tycho - could you weigh in?)
> 
> [1]: https://lore.kernel.org/all/CAGXu5jKsLDSBjB74SrvCvmGy_RTEjBsMtR5dk1CcRFrHEQfM_g@mail.gmail.com/

We need to check if the NLA API could work.  Kees's answer was missing
explanation.  Otherwise we should get inspiration from fanotify
messages.

> 
> > 
> > There is still the question about receiving FDs though. It would be nice
> > to have a (set of?) dedicated IOCTL(s) to receive an FD, but I'm not
> > sure how this could be properly handled wrt NLA.
> 
> Also, if we go with netlink messages, why do we need additional IOCTLs? Can
> we open the fd when we write out the message? (Maybe I will end up realizing
> the reason for this after reading netlink code, but I would )

It's much easier to have static-sized struct, both for developers and
for introspection tools (e.g. strace).  However, in this case we also
would also have variable-lenght data.  See my other reply discussing the
IOCTL idea.

> 
> > 
> > > 
> > > I will try out the audit patch to see how things like these appears in the
> > > log before commenting further on this. Maybe there is a way to achieve this
> > > while still simplifying the event structure?
> > > 
> > > > 
> > > > > +			/**
> > > > > +			 * @destname: A filename for a file creation target.
> > > > > +			 *
> > > > > +			 * If either of fd1 or fd2 points to a parent
> > > > > +			 * directory rather than the target file, this is the
> > > > > +			 * NULL-terminated name of the file that will be
> > > > > +			 * newly created.
> > > > > +			 *
> > > > > +			 * Counting the NULL terminator, this field will
> > > > > +			 * contain one or more NULL padding at the end so
> > > > > +			 * that the length of the whole struct
> > > > > +			 * landlock_supervise_event is a multiple of 8 bytes.
> > > > > +			 *
> > > > > +			 * This is a variable length member, and the length
> > > > > +			 * including the terminating NULL(s) can be derived
> > > > > +			 * from hdr.length - offsetof(struct
> > > > > +			 * landlock_supervise_event, destname).
> > > > > +			 */
> > > > > +			char destname[];
> > > > 
> > > > I'd prefer to avoid sending file names for now.  I don't think it's
> > > > necessary, and that could encourage supervisors to filter access
> > > > according to names.
> > > > 
> > > 
> > > This is also motivated by the potential UX I'm thinking of. For example, if
> > > a newly installed application tries to create ~/.app-name, it will be much
> > > more reassuring and convenient to the user if we can show something like
> > > 
> > > 	[program] wants to mkdir ~/.app-name. Allow this and future
> > > 	access to the new directory?
> > > 
> > > rather than just "[program] wants to mkdir under ~". (The "Allow this and
> > > future access to the new directory" bit is made possible by the supervisor
> > > knowing the name of the file/directory being created, and can remember them
> > > / write them out to a persistent profile etc)
> > > 
> > > Note that this is just the filename under the dir represented by fd - this
> > > isn't a path or anything that can be subject to symlink-related attacks,
> > > etc.  If a program calls e.g.
> > > mkdirat or openat (dfd -> "/some/", pathname="dir/stuff", O_CREAT)
> > > my understanding is that fd1 will point to /some/dir, and destname would be
> > > "stuff"
> > 
> > Right, this file name information would be useful.  In the case of
> > audit, the goal is to efficiently and asynchronously log security events
> > (and align with other LSM logs and related limitations), not primarily
> > to debug sandboxed apps nor to enrich this information for decision
> > making, but the supervisor feature would help here.  The patch message
> > should include this rationale.
> 
> Will do
> 
> > 
> > > 
> > > Actually, in case your question is "why not send a fd to represent the newly
> > > created file, instead of sending the name" -- I'm not sure whether you can
> > > open even an O_PATH fd to a non-existent file.
> > 
> > That would not be possible because it would not exist yet, a file name
> > (not file path) is OK for this case.
> > 
> > > 
> > > > > +		};
> > > > > +		struct {
> > > > > +			__u16 port;
> > > > > +		};
> > > > > +	};
> > > > > +};
> > > > > +
> > > > 
> > > > [...]
> > > 
> > > 
>