[RFC PATCH 5/9] Define user structure for events and responses.

Sat Mar 8 19:07:48 UTC 2025

On Thu, Mar 06, 2025 at 03:05:10AM +0000, Tingmao Wang wrote:
> On 3/4/25 19:49, Mickaël Salaün wrote:
> > On Tue, Mar 04, 2025 at 01:13:01AM +0000, Tingmao Wang wrote:
> [...]
> > > +	/**
> > > +	 * @cookie: Opaque identifier to be included in the response.
> > > +	 */
> > > +	__u32 cookie;
> > 
> > I guess we could use a __u64 index counter per layer instead.  That
> > would also help to order requests if they are treated by different
> > supervisor threads.
> 
> I don't immediately see a use for ordering requests (if we get more than one
> event at once, they are coming from different threads anyway so there can't
> be any dependencies between them, and the supervisor threads can use
> timestamps), but I think making it a __u64 is probably a good idea
> regardless, as it means we don't have to do some sort of ID allocation, and
> can just increment an atomic.

Indeed, we should follow the seccomp unotify approach with a random u64
incremented per request.

> 
> > > +};
> > > +
> > > +struct landlock_supervise_event {
> > > +	struct landlock_supervise_event_hdr hdr;
> > > +	__u64 access_request;
> > > +	__kernel_pid_t accessor;
> > > +	union {
> > > +		struct {
> > > +			/**
> > > +			 * @fd1: An open file descriptor for the file (open,
> > > +			 * delete, execute, link, readdir, rename, truncate),
> > > +			 * or the parent directory (for create operations
> > > +			 * targeting its child) being accessed.  Must be
> > > +			 * closed by the reader.
> > > +			 *
> > > +			 * If this points to a parent directory, @destname
> > > +			 * will contain the target filename. If @destname is
> > > +			 * empty, this points to the target file.
> > > +			 */
> > > +			int fd1;
> > > +			/**
> > > +			 * @fd2: For link or rename requests, a second file
> > > +			 * descriptor for the target parent directory.  Must
> > > +			 * be closed by the reader.  @destname contains the
> > > +			 * destination filename.  This field is -1 if not
> > > +			 * used.
> > > +			 */
> > > +			int fd2;
> > 
> > Can we just use one FD but identify the requested access instead and
> > send one event for each, like for the audit patch series?
> 
> I haven't managed to read or test out the audit patch yet (I will do), but I
> think having the ability to specifically tell whether the child is trying to
> move / rename / create a hard link of an existing file, and what it's trying
> to use as destination, might be useful (either for security, or purely for
> UX)?
> 
> For example, imagine something trying to link or move ~/.ssh/id_ecdsa to
> /tmp/innocent-tmp-file then read the latter. The supervisor can warn the
> user on the initial link attempt, and the shenanigan will probably be
> stopped there (although still, being able to say "[program] wants to link
> ~/.ssh/id_ecdsa to /tmp/innocent-tmp-file" seems better than just "[program]
> wants to create a link for ~/.ssh/id_ecdsa"), but even if somehow this ends
> up allowed, later on for the read request it could say something like
> 
> 	[program] wants to read /tmp/innocent-tmp-file
> 	    (previously moved from ~/.ssh/id_ecdsa)
> 
> Maybe this is a bit silly, but there might be other use cases for knowing
> the exact details of a rename/link request, either for at-the-time decision
> making, or tracking stuff for future requests?

This pattern looks like datagram packets.  I think we should use the
netlink attributes.  There were concern about using a netlink socket for
the seccomp unotification though:
https://lore.kernel.org/all/CALCETrXeZZfVzXh7SwKhyB=+ySDk5fhrrdrXrcABsQ=JpQT7Tg@mail.gmail.com/

There are two main differences with seccomp unotify:
- the supervisor should be able to receive arbitrary-sized data (e.g.
  file name, not path);
- the supervisor should be able to receive file descriptors (instead of
  path).

Sockets are created with socket(2) whereas in our case we should only
get a supervisor FD (indirectly) through landlock_restrict_self(2),
which clearly identifies a kernel object.  Another issue would be to
deal with network namespaces, probably by creating a private one.
Sockets are powerful but we don't needs all the routing complexity.
Moreover, we should only need a blocking communication channel to avoid
issues managing in-flight object references (transformed to FDs when
received).  That makes me think that a socket might not be the right
construct, but we can still rely on the NLA macros to define a proper
protocol with dynamically-sized events, received and send with dedicated
IOCTL commands.

Netlink already provides a way to send a cookie, and
netlink_attribute_type defines the types we'll need, including string.

For instance, a link request/event could include 3 packets, one for each
of these properties:
1. the source file FD;
2. the destination directory FD;
3. the destination filename string.

This way we would avoid the union defined in this patch.

There is still the question about receiving FDs though. It would be nice
to have a (set of?) dedicated IOCTL(s) to receive an FD, but I'm not
sure how this could be properly handled wrt NLA.

> 
> I will try out the audit patch to see how things like these appears in the
> log before commenting further on this. Maybe there is a way to achieve this
> while still simplifying the event structure?
> 
> > 
> > > +			/**
> > > +			 * @destname: A filename for a file creation target.
> > > +			 *
> > > +			 * If either of fd1 or fd2 points to a parent
> > > +			 * directory rather than the target file, this is the
> > > +			 * NULL-terminated name of the file that will be
> > > +			 * newly created.
> > > +			 *
> > > +			 * Counting the NULL terminator, this field will
> > > +			 * contain one or more NULL padding at the end so
> > > +			 * that the length of the whole struct
> > > +			 * landlock_supervise_event is a multiple of 8 bytes.
> > > +			 *
> > > +			 * This is a variable length member, and the length
> > > +			 * including the terminating NULL(s) can be derived
> > > +			 * from hdr.length - offsetof(struct
> > > +			 * landlock_supervise_event, destname).
> > > +			 */
> > > +			char destname[];
> > 
> > I'd prefer to avoid sending file names for now.  I don't think it's
> > necessary, and that could encourage supervisors to filter access
> > according to names.
> > 
> 
> This is also motivated by the potential UX I'm thinking of. For example, if
> a newly installed application tries to create ~/.app-name, it will be much
> more reassuring and convenient to the user if we can show something like
> 
> 	[program] wants to mkdir ~/.app-name. Allow this and future
> 	access to the new directory?
> 
> rather than just "[program] wants to mkdir under ~". (The "Allow this and
> future access to the new directory" bit is made possible by the supervisor
> knowing the name of the file/directory being created, and can remember them
> / write them out to a persistent profile etc)
> 
> Note that this is just the filename under the dir represented by fd - this
> isn't a path or anything that can be subject to symlink-related attacks,
> etc.  If a program calls e.g.
> mkdirat or openat (dfd -> "/some/", pathname="dir/stuff", O_CREAT)
> my understanding is that fd1 will point to /some/dir, and destname would be
> "stuff"

Right, this file name information would be useful.  In the case of
audit, the goal is to efficiently and asynchronously log security events
(and align with other LSM logs and related limitations), not primarily
to debug sandboxed apps nor to enrich this information for decision
making, but the supervisor feature would help here.  The patch message
should include this rationale.

> 
> Actually, in case your question is "why not send a fd to represent the newly
> created file, instead of sending the name" -- I'm not sure whether you can
> open even an O_PATH fd to a non-existent file.

That would not be possible because it would not exist yet, a file name
(not file path) is OK for this case.

> 
> > > +		};
> > > +		struct {
> > > +			__u16 port;
> > > +		};
> > > +	};
> > > +};
> > > +
> > 
> > [...]
> 
>