[PATCH v3 0/5] Fix Landlock audit test flakiness

Thu Apr 2 20:52:46 UTC 2026

Hello!

On Thu, Apr 02, 2026 at 09:26:01PM +0200, Mickaël Salaün wrote:
> This series fixes two classes of audit selftest failures plus two minor
> bugs in the audit test helpers.
> 
> The main issue is that domain deallocation audit records are emitted
> asynchronously from kworker threads and can arrive after a previous
> test's socket has been closed.  This causes two distinct failure modes:
> 
> - audit_match_record() picks up a stale deallocation record from a
>   previous test instead of the expected one, causing a domain ID
>   mismatch.  The audit.layers test (which reads 16 deallocation records
>   in sequence) is particularly vulnerable because the large read window
>   allows stale records to interleave.  Patch 4 fixes this by filtering
>   deallocation records by domain ID and skipping type-matching records
>   with wrong content patterns.
> 
> - audit_count_records() counts stale deallocation records from a
>   previous test, incrementing records.domain from the expected 0 to 1.
>   Patch 3 fixes this by draining stale records at audit_init() time and
>   removing records.domain == 0 checks that are not preceded by
>   audit_match_record() calls (which would consume stale records).
> 
> These races are more likely to manifest when additional instrumentation
> changes kworker timing in the deallocation path (e.g. with the upcoming
> Landlock tracepoints work).
> 
> The two minor fixes (patches 1-2) correct a snprintf truncation check
> off-by-one and socket file descriptor leaks on error paths in
> audit_init(), audit_init_with_exe_filter(), and audit_cleanup().
> Patch 5 fixes a __u64 format warning reported by the kbuild bot on
> powerpc64.
> 
> Patch 1 is an exact subset of the v1 combined patch, which is why it
> carries the Reviewed-by tag.  Patches 2 and 3 extend beyond what was in
> v1, so the Reviewed-by is not carried.  Patches 4 and 5 are new.
> 
> Changes since v2:
> https://lore.kernel.org/r/20260401161503.1136946-1-mic@digikod.net
> - Patches 4-5: fix __u64 format warnings on powerpc64 (cast to unsigned
>   long long for %llx).  Patch 5 is new.
> 
> Changes since v1:
> https://lore.kernel.org/r/20260312100444.2609563-8-mic@digikod.net
> - Split the combined drain fix into four separate patches.
> - Patch 2: extend fd leak fix to audit_init_with_exe_filter() and
>   audit_cleanup().
> - Patch 3: also remove domain checks from audit.trace and
>   scoped_audit.connect_to_child, document constraint, explain why a
>   longer drain timeout was rejected.
> - Patch 4: new, add domain ID filtering and timeout management to
>   matches_log_domain_deallocated(), skip stale records in
>   audit_match_record().
> 
> Mickaël Salaün (5):
>   selftests/landlock: Fix snprintf truncation checks in audit helpers
>   selftests/landlock: Fix socket file descriptor leaks in audit helpers
>   selftests/landlock: Drain stale audit records on init
>   selftests/landlock: Skip stale records in audit_match_record()
>   selftests/landlock: Fix format warning for __u64 in net_test
> 
>  tools/testing/selftests/landlock/audit.h      | 133 ++++++++++++++----
>  tools/testing/selftests/landlock/audit_test.c |  36 ++---
>  tools/testing/selftests/landlock/net_test.c   |   2 +-
>  .../testing/selftests/landlock/ptrace_test.c  |   1 -
>  .../landlock/scoped_abstract_unix_test.c      |   1 -
>  5 files changed, 119 insertions(+), 54 deletions(-)
> 
> -- 
> 2.53.0
> 

I am still getting flaky audit tests even with these patches, I am
afraid.  It differs which of these tests is flaking, some of them
still do, for example:

#  RUN           audit_layout1.remove_dir ...
# fs_test.c:7281:remove_dir:Expected 0 (0) == matches_log_fs(_metadata, self->audit_fd, "fs\\.remove_dir", dir_s1d2) (-11)
# remove_dir: Test failed
#          ❌ FAIL  audit_layout1.remove_dir
not ok 191 audit_layout1.remove_dir
#  RUN           audit_layout1.read_dir ...
#            ✅ OK  audit_layout1.read_dir
ok 192 audit_layout1.read_dir
#  RUN           audit_layout1.read_file ...
#            ✅ OK  audit_layout1.read_file
ok 193 audit_layout1.read_file
#  RUN           audit_layout1.write_file ...
# fs_test.c:7221:write_file:Expected 0 (0) == matches_log_fs(_metadata, self->audit_fd, "fs\\.write_file", file1_s1d1) (-11)
# fs_test.c:7224:write_file:Expected 0 (0) == records.access (1)
# write_file: Test failed
#          ❌ FAIL  audit_layout1.write_file
not ok 194 audit_layout1.write_file

My kernel config is this:

    make defconfig
    make kvm_guest.config
    KCONFIG_CONFIG="${KBUILD_OUTPUT}/.config" ./scripts/kconfig/merge_config.sh "${KBUILD_OUTPUT}/.config" tools/testing/selftests/landlock/config
    make debug.config
    echo "CONFIG_RANDOMIZE_BASE=n" >> "${KBUILD_OUTPUT}/.config"
    make olddefconfig

and then I run the selftests in Qemu with these flags:

qemu-system-x86_64 \
    -nographic \
    -m 4G \
    -enable-kvm \
    -append "console=ttyS0 lsm=landlock no_hash_pointers" \
    -kernel "${KBUILD_OUTPUT}/arch/x86/boot/bzImage" \
    -initrd "${INITRAMFS}"

This is using my own selftest runner scripts which builds an initramfs
with the statically linked selftests.

Do you have a hunch what might be missing there?  In the test run
above, I have applied your V4 patch set on top of the current master,
5619b098e2fbf3a23bf13d91897056a1fe238c6d ("Merge tag 'for-7.0-rc6-tag'
of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux").

–Günther