Kernel Self Protection Project
This project starts with the premise that kernel bugs have a very long lifetime, and that the kernel must be designed in ways to protect against these flaws. We must think of security beyond fixing bugs. As a community, we already find and fix individual bugs via static checkers (compiler flags, smatch, coccinelle, coverity) and dynamic checkers (kernel configs, trinity, KASan). Those efforts are important and on-going, but if we want to protect our billion Android phones, our cars, the International Space Station, and everything else running Linux, we must get proactive defensive technologies built into the upstream Linux kernel. We need the kernel to fail safely, instead of just running safely.
These kinds of protections have existed for years in PaX, grsecurity, and piles of academic papers. For various social, cultural, and technical reasons, they have not made their way into the upstream kernel, and this project seeks to change that. Our focus is on kernel self-protection, rather than kernel-supported userspace protections. The goal is to eliminate classes of bugs and eliminate methods of exploitation.
A short list of things to keep in mind when designing self-protection features:
- Patience and an open mind will be needed. We're all trying to make Linux better, so let's stay focused on the results.
- Features will be more than finding bugs. Should be active at run-time to catch previously unknown flaws.
- Features will not be developer-"opt-in". When a feature is enabled at build time, it should work for all code built into the kernel (which has the side-effect of also covering out-of-tree code, like in vendor forks).
Want to get involved? Join the kernel hardening mailing list and introduce yourself. Then pick an area of work from below (or add a new one), coordinate on the mailing list, and get started. If your employer is brave enough to understand how critical this work is, they'll pay you to work on it. If not, the Linux Foundation's Core Infrastructure Initiative is in a great position to fund specific work proposals. We need kernel developers, compiler developers, testers, backporters, a documentation writers.
While there are already a number of upstream kernel security features, we are still missing many. While the following is far from a comprehensive list, it's at least a starting point we can add to:
- Stack overflow
- Integer overflow
- Heap overflow
- Format string injection
- Kernel pointer leak
- Uninitialized variables
- Kernel location
- Text overwrite
- Function pointer overwrite
- Userspace execution
- Userspace data usage
- Reused code chunks
Completed Kernel Protections
The following kernel protections have been already been accepted into the mainline Linux kernel, or are in some stage of development.
- Kernel reference counter overflow protection
Specific TODO Items
Besides the general work outlined above, there are number of specific tasks that have either been asked about frequently or are otherwise in need some time and attention:
- Split thread_info off of kernel stack (Done: x86, arm64, s390. Needed on arm, powerpc and others?)
- Move kernel stack to vmap area (Done: x86, arm64, s390. Needed on powerpc and others?)
- Implement kernel relocation and KASLR for ARM
- Write a plugin to clear struct padding
- Write a plugin to do format string warnings correctly (gcc's -Wformat-security is bad about const strings)
- Reorganize and rename CONFIG_DEBUG_RODATA (and related options) to something without "DEBUG" in the name (in progress)
- Make CONFIG_DEBUG_RODATA mandatory (done for arm64 and x86, other archs still need it)
- Convert remaining BPF JITs to eBPF JIT (with blinding)
- Write lib/test_bpf.c tests for eBPF constant blinding
- Further restriction of perf_event_open (e.g. perf_event_paranoid=3)
- Identify and extend HARDENED_USERCOPY to other usercopy functions (e.g. maybe csum_partial_copy_from_user, csum_and_copy_from_user, csum_and_copy_to_user, csum_partial_copy_nocheck?)
- Extend HARDENED_USERCOPY to use slab whitelisting
- Extend HARDENED_USERCOPY to split user-facing malloc()s and in-kernel malloc()svmalloc stack guard pages
- protect ARM vector table as fixed-location kernel target
- disable kuser helpers on arm
- harden and rename CONFIG_DEBUG_LIST better and default=y
- add zeroing of copy_from_user on failure test to test_usercopy.c
- consolidate all architecture's use of usercopy into asm-generic/uaccess.h
- add WARN path for page-spanning usercopy checks (instead of the separate CONFIG)
- create UNEXPECTED(), like BUG() but without the lock-busting, etc
- adjust usercopy CONFIG to be !DEVKMEM && STRICT_DEVMEM=y (PROC_KCORE is incompat with usercopy too)
- provide mechanism to check for ro_after_init memory areas, and reject structures not marked ro_after_init in vmbus_register()
- expand use of __ro_after_init, especially in arch/arm64
- Add stack-frame walking to usercopy implementations (Done: x86. In progress: arm64. Needed on arm, others?)
People ask from time to time what a good security set of build CONFIGs and runtime sysctl are. This is a brain-dump of the various options for a particularly paranoid system.
# Report BUG() conditions and kill the offending process. CONFIG_BUG=y # Make sure kernel page tables have safe permissions. CONFIG_DEBUG_KERNEL=y CONFIG_DEBUG_RODATA=y # Use -fstack-protector-strong (gcc 4.9+) for best stack canary coverage. CONFIG_CC_STACKPROTECTOR=y CONFIG_CC_STACKPROTECTOR_STRONG=y # Do not allow direct physical memory access (but if you must have it, at least enable STRICT mode...) # CONFIG_DEVMEM is not set CONFIG_STRICT_DEVMEM=y CONFIG_IO_STRICT_DEVMEM=y # Provides some protections against SYN flooding. CONFIG_SYN_COOKIES=y # Perform additional validation of various commonly targetted structures. CONFIG_DEBUG_CREDENTIALS=y CONFIG_DEBUG_NOTIFIERS=y CONFIG_DEBUG_LIST=y CONFIG_BUG_ON_DATA_CORRUPTION=y # Provide userspace with seccomp BPF API for syscall attack surface reduction. CONFIG_SECCOMP=y CONFIG_SECCOMP_FILTER=y # Provide userspace with ptrace ancestry protections. CONFIG_SECURITY=y CONFIG_SECURITY_YAMA=y # Perform usercopy bounds checking. CONFIG_HARDENED_USERCOPY=y # Randomize allocator freelists. CONFIG_SLAB_FREELIST_RANDOM=y # Allow allocator validation checking to be enabled (see "slub_debug=P" below). CONFIG_SLUB_DEBUG=y # Wipe higher-level memory allocations when they are freed (needs "page_poison=1" command line below). # (If you can afford even more performance penalty, leave CONFIG_PAGE_POISONING_NO_SANITY=n) CONFIG_PAGE_POISONING=y CONFIG_PAGE_POISONING_NO_SANITY=y CONFIG_PAGE_POISONING_ZERO=y # Dangerous; enabling this allows direct physical memory writing. # CONFIG_ACPI_CUSTOM_METHOD is not set # Dangerous; enabling this disables brk ASLR. # CONFIG_COMPAT_BRK is not set # Dangerous; enabling this allows direct kernel memory writing. # CONFIG_DEVKMEM is not set # Dangerous; enabling this disables VDSO ASLR. # CONFIG_COMPAT_VDSO is not set # Dangerous; enabling this allows replacement of running kernel. # CONFIG_KEXEC is not set # Dangerous; enabling this allows replacement of running kernel. # CONFIG_HIBERNATION is not set # Prior to v4.1, assists heap memory attacks; best to keep interface disabled. # CONFIG_INET_DIAG is not set # Easily confused by misconfigured userspace, keep off. # CONFIG_BINFMT_MISC is not set # Use the modern PTY interface (devpts) only. # CONFIG_LEGACY_PTYS is not set # Reboot devices immediately if kernel experiences an Oops. CONFIG_PANIC_ON_OOPS=y CONFIG_PANIC_TIMEOUT=-1 # Keep root from altering kernel memory via loadable modules. # CONFIG_MODULES is not set # But if CONFIG_MODULE=y is needed, at least they must be signed with a per-build key. CONFIG_DEBUG_SET_MODULE_RONX=y CONFIG_MODULE_SIG=y CONFIG_MODULE_SIG_FORCE=y CONFIG_MODULE_SIG_ALL=y CONFIG_MODULE_SIG_SHA512=y CONFIG_MODULE_SIG_HASH="sha512" CONFIG_MODULE_SIG_KEY="certs/signing_key.pem"
# On 32-bit kernels, require PAE for NX bit support. # CONFIG_M486 is not set # CONFIG_HIGHMEM4G is not set CONFIG_HIGHMEM64G=y CONFIG_X86_PAE=y # Disallow allocating the first 64k of memory. CONFIG_DEFAULT_MMAP_MIN_ADDR=65536 # Randomize position of kernel. CONFIG_RANDOMIZE_BASE=y
# Full 64-bit means PAE and NX bit. CONFIG_X86_64=y # Disallow allocating the first 64k of memory. CONFIG_DEFAULT_MMAP_MIN_ADDR=65536 # Randomize position of kernel and memory. CONFIG_RANDOMIZE_BASE=y CONFIG_RANDOMIZE_MEMORY=y # Modern libc no longer needs a fixed-position mapping in userspace, remove it as a possible target. CONFIG_LEGACY_VSYSCALL_NONE=y # Remove additional attack surface, unless you really need them. # CONFIG_IA32_EMULATION is not set # CONFIG_X86_X32 is not set # CONFIG_MODIFY_LDT_SYSCALL is not set
# Disallow allocating the first 32k of memory (cannot be 64k due to ARM loader). CONFIG_DEFAULT_MMAP_MIN_ADDR=32768 # For maximal userspace memory area (and maximum ASLR). CONFIG_VMSPLIT_3G=y # If building an out-of-tree Qualcomm kernel, this is similar to CONFIG_DEBUG_RODATA. CONFIG_STRICT_MEMORY_RWX=y # Make sure PXN/PAN emulation is enabled. CONFIG_CPU_SW_DOMAIN_PAN=y # Dangerous; old interfaces and needless additional attack surface. # CONFIG_OABI_COMPAT is unset
# Disallow allocating the first 32k of memory (cannot be 64k due to ARM loader). CONFIG_DEFAULT_MMAP_MIN_ADDR=32768 # Randomize position of kernel (requires UEFI RNG or bootloader support for /chosen/kaslr-seed DT property). CONFIG_RANDOMIZE_BASE=y
kernel command line options
# Enable slub/slab allocator free poisoning (requires CONFIG_SLUB_DEBUG=y above). slub_debug=P # Enable buddy allocator free poisoning (requires CONFIG_PAGE_POISONING=y above). page_poison=1
# Remove vsyscall entirely to avoid it being a fixed-position ROP target of any kind. # (Same as CONFIG_LEGACY_VSYSCALL_NONE=y above.) vsyscall=none
# Try to keep kernel address exposures out of various /proc files (kallsyms, modules, etc). kernel.kptr_restrict = 1 # Avoid kernel memory address exposures via dmesg. kernel.dmesg_restrict = 1 # Block non-uid-0 profiling (needs distro patch, otherwise this is the same as "= 2") kernel.perf_event_paranoid = 3 # Turn off kexec, even if it's built in. kernel.kexec_load_disabled = 1 # Avoid non-ancestor ptrace access to running processes and their credentials. kernel.yama.ptrace_scope = 1