The Linux Security Workgroup has put together this page in an effort to bring the Linux security community together in hardening the Linux Kernel and to help prevent duplication of efforts. There are a number of desired Linux Kernel hardening projects listed below that are inactive and do not have an owner. If you would like to take ownership of one of these projects or have an update for this page, please email the kernel-hardening mailing list at email@example.com.
Security Code Review Guidelines
This project is an effort to provide a reference that educates subsystem maintainers on what to look for when performing security reviews/audits. This would include various classes of common coding vulnerabilities and how to detect them, as well as other best practices, such as not leaving private keys laying around.
This project would provide support to determine if patches have been modified or tampered since they were signed.
Verification of Critical Subsystems
This project would provide verification of critical subsystems such as:
- Network file systems
- Cryptographic library
- Kernel build infrastructure
This could include approaches such as manual audits, static analysis, fuzzing testing, etc.
There are several kernel hardening features that have appeared in other hardened operating systems that would improve the security of Linux. Some have been controversial, so attempts have been made to describe them, including their controversy and discussion over the years, so as much information is available to make an educated decision about potential implementations.
Partial NX Emulation
Non-executable memory is likely one of the most important protections in modern computing. Hardware support exists for it in modern CPUs, but many systems do not benefit from this security.
To simulate the execute bit in the kernel's memory page tables, the CS register is used to break memory into two regions. This allows for a fast way to distinguish between memory above and below the CS-limit. Executable regions are loaded below the CS-limit. This is fast but not perfectly accurate, since the BSS regions of loaded libraries will remain in the executable region. It does provide a split between the loaded libraries (and BSS) and text segment from the brk and mmap heap and stack regions.
Versions of this patch have been carried by RedHat, SUSE, Openwall, grsecurity and others for a long time.
Many administrators attempt to contain potentially exploitable services in chroots. Unfortunately, chroots are not designed to be a security protection (they are for development and debugging). It is possible to reasonably contain a non-privileged process in a chroot, but attempting to contain a root user is fraught with pitfalls. While it is certainly possible to patch the kernel to have a hardened chroot() (for example, grsecurity has a large set of protections that lock down chroots) so many behaviors are changed and come in conflict with the more common development configurations.
Solutions are varied. Among the methods of chroot escape is manipulating the current working directory to be outside the current chroot via a second chroot() call (others include using /proc/*/cwd, fchdir(), and ptrace). This single flaw is trivial to fix, but does not block the other avenues, so the gain is very small when compared with the down-side of carrying a delta from the upstream kernel.
A better solution is to side-step the problem entirely. Since these security protections are being designed correctly with containers (see CLONE_NEW*), it would be better to use containers or MAC from the start when trying to isolate a service.
Some links to the history of its discussion:
- 2007 Sep, David Newall http://lkml.indiana.edu/hypermail/linux/kernel/0709.3/0721.html
Past objections and rebuttals could be summarized as:
- Violates POSIX.
- POSIX didn't consider or really define this situation, and it's not useful to follow a broken specification at the cost of security.
- Might break debootstrap, debian-installer, and anything else that expects to chroot() within a chroot.
- True, but maybe disallowing double-chroot is okay.
- Can escape chroots in a large number of ways; containers are better.
- Fix each flaw. Containers are not very easy to use yet.
Additional Kernel Hardening Development Projects
Following are more upstream Linux kernel projects that would make it harder for security vulnerabilities to become exploitable.
Note: Many CONFIG_* items below refer to PaX and grsecurity.
- remove remaining kernel address leaks that prevent ASLR from being effective
- inet_diag NETLINK socket addresses
- chase down const-ification of function pointers
- examine page permissions and get rid of rwx mappings
- implement __read_only for things that can't really be const, like CONFIG_PAX_KERNEXEC
- disable set_kernel_text_rw() and friends via sysctl
- module autoloading control, like CONFIG_GRKERNSEC_MODHARDEN
- block hibernation image attacks (Vasiliy Kulikov)
- copy_*_user() hardening, like CONFIG_PAX_USERCOPY
- keep length under MAX_INT
- validate targets against compiler knowledge of static buffers or look up buffer sizes from heap allocator
- User/Kernel memory segmentation, like CONFIG_PAX_MEMORY_UDEREF or Intel SMEP
- Kernel stack ASLR, like CONFIG_PAX_RANDKSTACK
- Kernel stack clearing, like CONFIG_PAX_STACKLEAK
- Kernel refcount overflow protection, like CONFIG_PAX_REFCOUNT
- kernel symbol name hiding, like CONFIG_GRKERNSEC_HIDESYM
- add -Wextra and perform associated cleanups
- restricted access to vm86-related syscall/features, like CONFIG_HARDEN_VM86 in Linux 2.4.x-ow, but turned into a sysctl
- ability to set/lock/force a process (and/or any children it might spawn) to 32-bit only or 64-bit only (or implement a general "personality lock" and have main/compat syscall availability be actually affected by the current personality, which is currently not the case)
- this will be particularly useful with container-based virtualization (LXC, OpenVZ, vserver), where the container startup program will lock the bitness/personality before launching the container's /sbin/init (e.g., a prctl() affecting _only_ child processes - e.g., not yet vzctl, but the container's /sbin/init - will do for this purpose)
- whitelist filesystem module autoloading. similar to rare network module blacklist
- linking restrictions (CONFIG_GRKERNSEC_LINK), see above... (Kees Cook)
- fifo restrictions (CONFIG_GRKERNSEC_FIFO), closely related to the linking restrictions mentioned above
- mprotect hardening (CONFIG_PAX_MPROTECT)
- segv respawn restriction (CONFIG_GRKERNSEC_BRUTE)
- /proc visibility restriction (CONFIG_GRKERNSEC_PROC_USER)
- safer set*uid() behavior on error (don't fail & return, instead SIGSEGV if has to fail because of resource shortage), was implemented unconditionally in Linux 2.4.x-ow but needs different treatment for 2.6.x/upstream (maybe sysctl'able)
- destroy shm not in use (CONFIG_HARDEN_SHM from Linux 2.4.x-ow), which is needed to prevent RLIMIT_AS*RLIMIT_NPROC bypasses
- nx-emulation (RedHat Exec-Shield, CONFIG_PAX_SEGMEXEC, or better yet CONFIG_PAX_PAGEEXEC)
- ASCII-armor ASLR (RedHat Exec-Shield)
- needs serious entropy improvement if it should be used at all
- at least with RHEL5'ish kernels (not tested on Ubuntu specifically), exec-shield appears to provide ASCII-armor for mmap'ed shared libs with 32-bit kernels, but does not do it when running 32-bit binaries on 64-bit kernels (64-bit bins are OK) - looks like a code bug (or incomplete implementation) to chase down and fix (this is needed for our own use regardless of upstream submission)
- "enforcing" mode for W^X (ignore GNU ELF flags), sysctl'able and/or per process tree and/or per-container
- TARPIT netfilter target https://bugs.launchpad.net/ubuntu/+source/linux/+bug/78361
- CAPs-less ping: http://marc.info/?l=linux-kernel&m=129434182105135