[PATCH v3 0/5] Fix Landlock audit test flakiness
Günther Noack
gnoack3000 at gmail.com
Thu Apr 2 20:52:46 UTC 2026
Hello!
On Thu, Apr 02, 2026 at 09:26:01PM +0200, Mickaël Salaün wrote:
> This series fixes two classes of audit selftest failures plus two minor
> bugs in the audit test helpers.
>
> The main issue is that domain deallocation audit records are emitted
> asynchronously from kworker threads and can arrive after a previous
> test's socket has been closed. This causes two distinct failure modes:
>
> - audit_match_record() picks up a stale deallocation record from a
> previous test instead of the expected one, causing a domain ID
> mismatch. The audit.layers test (which reads 16 deallocation records
> in sequence) is particularly vulnerable because the large read window
> allows stale records to interleave. Patch 4 fixes this by filtering
> deallocation records by domain ID and skipping type-matching records
> with wrong content patterns.
>
> - audit_count_records() counts stale deallocation records from a
> previous test, incrementing records.domain from the expected 0 to 1.
> Patch 3 fixes this by draining stale records at audit_init() time and
> removing records.domain == 0 checks that are not preceded by
> audit_match_record() calls (which would consume stale records).
>
> These races are more likely to manifest when additional instrumentation
> changes kworker timing in the deallocation path (e.g. with the upcoming
> Landlock tracepoints work).
>
> The two minor fixes (patches 1-2) correct a snprintf truncation check
> off-by-one and socket file descriptor leaks on error paths in
> audit_init(), audit_init_with_exe_filter(), and audit_cleanup().
> Patch 5 fixes a __u64 format warning reported by the kbuild bot on
> powerpc64.
>
> Patch 1 is an exact subset of the v1 combined patch, which is why it
> carries the Reviewed-by tag. Patches 2 and 3 extend beyond what was in
> v1, so the Reviewed-by is not carried. Patches 4 and 5 are new.
>
> Changes since v2:
> https://lore.kernel.org/r/20260401161503.1136946-1-mic@digikod.net
> - Patches 4-5: fix __u64 format warnings on powerpc64 (cast to unsigned
> long long for %llx). Patch 5 is new.
>
> Changes since v1:
> https://lore.kernel.org/r/20260312100444.2609563-8-mic@digikod.net
> - Split the combined drain fix into four separate patches.
> - Patch 2: extend fd leak fix to audit_init_with_exe_filter() and
> audit_cleanup().
> - Patch 3: also remove domain checks from audit.trace and
> scoped_audit.connect_to_child, document constraint, explain why a
> longer drain timeout was rejected.
> - Patch 4: new, add domain ID filtering and timeout management to
> matches_log_domain_deallocated(), skip stale records in
> audit_match_record().
>
> Mickaël Salaün (5):
> selftests/landlock: Fix snprintf truncation checks in audit helpers
> selftests/landlock: Fix socket file descriptor leaks in audit helpers
> selftests/landlock: Drain stale audit records on init
> selftests/landlock: Skip stale records in audit_match_record()
> selftests/landlock: Fix format warning for __u64 in net_test
>
> tools/testing/selftests/landlock/audit.h | 133 ++++++++++++++----
> tools/testing/selftests/landlock/audit_test.c | 36 ++---
> tools/testing/selftests/landlock/net_test.c | 2 +-
> .../testing/selftests/landlock/ptrace_test.c | 1 -
> .../landlock/scoped_abstract_unix_test.c | 1 -
> 5 files changed, 119 insertions(+), 54 deletions(-)
>
> --
> 2.53.0
>
I am still getting flaky audit tests even with these patches, I am
afraid. It differs which of these tests is flaking, some of them
still do, for example:
# RUN audit_layout1.remove_dir ...
# fs_test.c:7281:remove_dir:Expected 0 (0) == matches_log_fs(_metadata, self->audit_fd, "fs\\.remove_dir", dir_s1d2) (-11)
# remove_dir: Test failed
# ❌ FAIL audit_layout1.remove_dir
not ok 191 audit_layout1.remove_dir
# RUN audit_layout1.read_dir ...
# ✅ OK audit_layout1.read_dir
ok 192 audit_layout1.read_dir
# RUN audit_layout1.read_file ...
# ✅ OK audit_layout1.read_file
ok 193 audit_layout1.read_file
# RUN audit_layout1.write_file ...
# fs_test.c:7221:write_file:Expected 0 (0) == matches_log_fs(_metadata, self->audit_fd, "fs\\.write_file", file1_s1d1) (-11)
# fs_test.c:7224:write_file:Expected 0 (0) == records.access (1)
# write_file: Test failed
# ❌ FAIL audit_layout1.write_file
not ok 194 audit_layout1.write_file
My kernel config is this:
make defconfig
make kvm_guest.config
KCONFIG_CONFIG="${KBUILD_OUTPUT}/.config" ./scripts/kconfig/merge_config.sh "${KBUILD_OUTPUT}/.config" tools/testing/selftests/landlock/config
make debug.config
echo "CONFIG_RANDOMIZE_BASE=n" >> "${KBUILD_OUTPUT}/.config"
make olddefconfig
and then I run the selftests in Qemu with these flags:
qemu-system-x86_64 \
-nographic \
-m 4G \
-enable-kvm \
-append "console=ttyS0 lsm=landlock no_hash_pointers" \
-kernel "${KBUILD_OUTPUT}/arch/x86/boot/bzImage" \
-initrd "${INITRAMFS}"
This is using my own selftest runner scripts which builds an initramfs
with the statically linked selftests.
Do you have a hunch what might be missing there? In the test run
above, I have applied your V4 patch set on top of the current master,
5619b098e2fbf3a23bf13d91897056a1fe238c6d ("Merge tag 'for-7.0-rc6-tag'
of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux").
–Günther
More information about the Linux-security-module-archive
mailing list