Kernel Self Protection Project
Mission Statement
This project starts with the premise that kernel bugs have a very long lifetime, and that the kernel must be designed in ways to protect against these flaws. We must think of security beyond fixing bugs. As a community, we already find and fix individual bugs via static checkers (compiler flags, smatch, coccinelle, coverity) and dynamic checkers (kernel configs, trinity, KASan). Those efforts are important and on-going, but if we want to protect our billion Android phones, our cars, the International Space Station, and everything else running Linux, we must get proactive defensive technologies built into the upstream Linux kernel. We need the kernel to fail safely, instead of just running safely.
These kinds of protections have existed for years in the PaX and grsecurity patches, and in piles of academic papers. For various social, cultural, and technical reasons, they have not made their way into the upstream kernel, and this project seeks to change that. Our focus is on kernel self-protection, rather than kernel-supported userspace protections. The goal is to eliminate classes of bugs and eliminate methods of exploitation.
Principles
A short list of things to keep in mind when designing self-protection features:
- Patience and an open mind will be needed. We're all trying to make Linux better, so let's stay focused on the results.
- Upstream development is evolutionary, not revolutionary, which means it can sometimes take time for features to become fully realized.
- Features will be more than finding bugs, and should be active at run-time to catch previously unknown flaws.
- Features will not be developer-"opt-in". When a feature is enabled at build time, it should work for all code built into the kernel (which has the side-effect of also covering out-of-tree code, like in vendor forks).
Get Involved
Want to get involved? Join the kernel hardening mailing list and introduce yourself. Then pick an area of work from below (or add a new one), coordinate on the mailing list, and get started. If your employer is brave enough to understand how critical this work is, they'll pay you to work on it. If not, the Linux Foundation's Core Infrastructure Initiative is in a great position to fund specific work proposals. We need kernel developers, compiler developers, testers, backporters, a documentation writers.
Work Areas
While there are already a number of upstream kernel security features, we are still missing many. While the following is far from a comprehensive list, it's at least a starting point we can add to:
Bug Classes
- Stack overflow
- Integer overflow
- Heap overflow
- Format string injection
- Kernel pointer leak
- Uninitialized variables
- Use-after-free
Exploitation Methods
- Kernel location
- Text overwrite
- Function pointer overwrite
- Userspace execution
- Userspace data usage
- Reused code chunks
Specific TODO Items
Besides the general work outlined above, there are number of specific tasks that have either been asked about frequently or are otherwise in need some time and attention:
- Split thread_info off of kernel stack (Done: x86, arm64, s390. Needed on arm, powerpc and others?)
- Move kernel stack to vmap area (Done: x86, s390. Needed on arm, arm64, powerpc and others?)
- Implement kernel relocation and KASLR for ARM
- Write a plugin to clear struct padding
- Write a plugin to do format string warnings correctly (gcc's -Wformat-security is bad about const strings)
- Make CONFIG_DEBUG_RODATA mandatory (done for arm64 and x86, other archs still need it)
- Convert remaining BPF JITs to eBPF JIT (with blinding)
- Write lib/test_bpf.c tests for eBPF constant blinding
- Further restriction of perf_event_open (e.g. perf_event_paranoid=3)
- Extend HARDENED_USERCOPY to use slab whitelisting (in progress)
- Extend HARDENED_USERCOPY to split user-facing malloc()s and in-kernel malloc()svmalloc stack guard pages (in progress)
- protect ARM vector table as fixed-location kernel target
- disable kuser helpers on arm
- rename CONFIG_DEBUG_LIST better and default=y
- add WARN path for page-spanning usercopy checks (instead of the separate CONFIG)
- create UNEXPECTED(), like BUG() but without the lock-busting, etc
- create defconfig "make" target for by-default hardened Kconfigs (using guidelines below)
- provide mechanism to check for ro_after_init memory areas, and reject structures not marked ro_after_init in vmbus_register()
- expand use of __ro_after_init, especially in arch/arm64
- Add stack-frame walking to usercopy implementations (Done: x86. In progress: arm64. Needed on arm, others?)
- restrict autoloading of kernel modules (like GRKERNSEC_MODHARDEN) (In progress: Timgad LSM)
Documentation
For kernel protections already in upstream (or under active development) that have specific documentation:
Self-Protection Guidelines
refcount_t
- Kernel reference counter overflow protection
Recommended settings
People ask from time to time what a good security set of build CONFIGs and runtime sysctl are. This is a brain-dump of the various options for a particularly paranoid system.
CONFIGs
# Report BUG() conditions and kill the offending process. CONFIG_BUG=y # Make sure kernel page tables have safe permissions. CONFIG_DEBUG_KERNEL=y CONFIG_DEBUG_RODATA=y (prior to v4.11) CONFIG_STRICT_KERNEL_RWX=y (since v4.11) # Report any dangerous memory permissions (not available on all archs). CONFIG_DEBUG_WX=y # Use -fstack-protector-strong (gcc 4.9+) for best stack canary coverage. CONFIG_CC_STACKPROTECTOR=y CONFIG_CC_STACKPROTECTOR_STRONG=y # Do not allow direct physical memory access (but if you must have it, at least enable STRICT mode...) # CONFIG_DEVMEM is not set CONFIG_STRICT_DEVMEM=y CONFIG_IO_STRICT_DEVMEM=y # Provides some protections against SYN flooding. CONFIG_SYN_COOKIES=y # Perform additional validation of various commonly targeted structures. CONFIG_DEBUG_CREDENTIALS=y CONFIG_DEBUG_NOTIFIERS=y CONFIG_DEBUG_LIST=y CONFIG_DEBUG_SG=y CONFIG_BUG_ON_DATA_CORRUPTION=y # Provide userspace with seccomp BPF API for syscall attack surface reduction. CONFIG_SECCOMP=y CONFIG_SECCOMP_FILTER=y # Provide userspace with ptrace ancestry protections. CONFIG_SECURITY=y CONFIG_SECURITY_YAMA=y # Perform usercopy bounds checking. CONFIG_HARDENED_USERCOPY=y # Randomize allocator freelists. CONFIG_SLAB_FREELIST_RANDOM=y # Allow allocator validation checking to be enabled (see "slub_debug=P" below). CONFIG_SLUB_DEBUG=y # Wipe higher-level memory allocations when they are freed (needs "page_poison=1" command line below). # (If you can afford even more performance penalty, leave CONFIG_PAGE_POISONING_NO_SANITY=n) CONFIG_PAGE_POISONING=y CONFIG_PAGE_POISONING_NO_SANITY=y CONFIG_PAGE_POISONING_ZERO=y # Adds guard pages to kernel stacks (not all architectures support this yet). CONFIG_VMAP_STACK=y # Dangerous; enabling this allows direct physical memory writing. # CONFIG_ACPI_CUSTOM_METHOD is not set # Dangerous; enabling this disables brk ASLR. # CONFIG_COMPAT_BRK is not set # Dangerous; enabling this allows direct kernel memory writing. # CONFIG_DEVKMEM is not set # Dangerous; exposes kernel text image layout. # CONFIG_PROC_KCORE is not set # Dangerous; enabling this disables VDSO ASLR. # CONFIG_COMPAT_VDSO is not set # Dangerous; enabling this allows replacement of running kernel. # CONFIG_KEXEC is not set # Dangerous; enabling this allows replacement of running kernel. # CONFIG_HIBERNATION is not set # Prior to v4.1, assists heap memory attacks; best to keep interface disabled. # CONFIG_INET_DIAG is not set # Easily confused by misconfigured userspace, keep off. # CONFIG_BINFMT_MISC is not set # Use the modern PTY interface (devpts) only. # CONFIG_LEGACY_PTYS is not set # Reboot devices immediately if kernel experiences an Oops. CONFIG_PANIC_ON_OOPS=y CONFIG_PANIC_TIMEOUT=-1 # Keep root from altering kernel memory via loadable modules. # CONFIG_MODULES is not set # But if CONFIG_MODULE=y is needed, at least they must be signed with a per-build key. CONFIG_DEBUG_SET_MODULE_RONX=y (prior to v4.11) CONFIG_STRICT_MODULE_RWX=y (since v4.11) CONFIG_MODULE_SIG=y CONFIG_MODULE_SIG_FORCE=y CONFIG_MODULE_SIG_ALL=y CONFIG_MODULE_SIG_SHA512=y CONFIG_MODULE_SIG_HASH="sha512" CONFIG_MODULE_SIG_KEY="certs/signing_key.pem"
x86_32
# On 32-bit kernels, require PAE for NX bit support. # CONFIG_M486 is not set # CONFIG_HIGHMEM4G is not set CONFIG_HIGHMEM64G=y CONFIG_X86_PAE=y # Disallow allocating the first 64k of memory. CONFIG_DEFAULT_MMAP_MIN_ADDR=65536 # Randomize position of kernel. CONFIG_RANDOMIZE_BASE=y
x86_64
# Full 64-bit means PAE and NX bit. CONFIG_X86_64=y # Disallow allocating the first 64k of memory. CONFIG_DEFAULT_MMAP_MIN_ADDR=65536 # Randomize position of kernel and memory. CONFIG_RANDOMIZE_BASE=y CONFIG_RANDOMIZE_MEMORY=y # Modern libc no longer needs a fixed-position mapping in userspace, remove it as a possible target. CONFIG_LEGACY_VSYSCALL_NONE=y # Remove additional attack surface, unless you really need them. # CONFIG_IA32_EMULATION is not set # CONFIG_X86_X32 is not set # CONFIG_MODIFY_LDT_SYSCALL is not set
arm
# Disallow allocating the first 32k of memory (cannot be 64k due to ARM loader). CONFIG_DEFAULT_MMAP_MIN_ADDR=32768 # For maximal userspace memory area (and maximum ASLR). CONFIG_VMSPLIT_3G=y # If building an old out-of-tree Qualcomm kernel, this is similar to CONFIG_STRICT_KERNEL_RWX. CONFIG_STRICT_MEMORY_RWX=y # Make sure PXN/PAN emulation is enabled. CONFIG_CPU_SW_DOMAIN_PAN=y # Dangerous; old interfaces and needless additional attack surface. # CONFIG_OABI_COMPAT is unset
arm64
# Disallow allocating the first 32k of memory (cannot be 64k due to ARM loader). CONFIG_DEFAULT_MMAP_MIN_ADDR=32768 # Randomize position of kernel (requires UEFI RNG or bootloader support for /chosen/kaslr-seed DT property). CONFIG_RANDOMIZE_BASE=y # Make sure PAN emulation is enabled. CONFIG_ARM64_SW_TTBR0_PAN=y
kernel command line options
# Enable slub/slab allocator free poisoning (requires CONFIG_SLUB_DEBUG=y above). slub_debug=P # Enable buddy allocator free poisoning (requires CONFIG_PAGE_POISONING=y above). page_poison=1
x86_64
# Remove vsyscall entirely to avoid it being a fixed-position ROP target of any kind. # (Same as CONFIG_LEGACY_VSYSCALL_NONE=y above.) vsyscall=none
sysctls
# Try to keep kernel address exposures out of various /proc files (kallsyms, modules, etc). kernel.kptr_restrict = 1 # Avoid kernel memory address exposures via dmesg. kernel.dmesg_restrict = 1 # Block non-uid-0 profiling (needs distro patch, otherwise this is the same as "= 2") kernel.perf_event_paranoid = 3 # Turn off kexec, even if it's built in. kernel.kexec_load_disabled = 1 # Avoid non-ancestor ptrace access to running processes and their credentials. kernel.yama.ptrace_scope = 1