IMA Namespacing design considerations
Our goals are to enable IMA-measurement, IMA-appraisal, and IMA-audit inside a container using Linux namespaces. The intention is to introduce an IMA namespace.
IMA-measurement extends the concept of “trusted boot” to the running OS. Based on policy, as files are accessed, executed, mmapped a hash of the file data is calculated and used to extend TPM, if enabled, and added to the IMA measurement list. The current builtin IMA-measurement Trusted Computing Base (TCB) policies measures all files read by root or executed/mmapped by any user. It also measures all kernel modules and firmware, when they are loaded, as well as the IMA policy itself.
IMA-appraisal: extends the concept of “secure boot” to the running OS. Based on policy, as files are accessed, executed, mmapped the file hash is calculated and used to verify the known good value as stored in the security.ima xattr. Stored in the security.ima xattr could be either a file hash or a signature. The keys for verifying the file data signature are found in IMA specific keyrings .ima or _ima.
IMA-audit: adds system audit records containing the file hash to the system audit log. The IMA-audit records can be used to augment existing security analytics software and be used for system forensics.
Namespacing each of these features requires not only adding IMA namespacing support, but requires some additional kernel changes. Our goals are to ultimately enable IMA-measurement, IMA-appraisal, and IMA-audit inside a container using Linux namespaces. Namespacing these different aspects of IMA is a major under taking and needs to be staged in manageable pieces.
IMA Namespacing Considerations
When namespacing IMA we certainly want to prevent the abuse of namespaces by users doing things that go undetected. A primary concern are activities of root in the TCB. Since root has all the rights on the system he could try to abuse his power by spawning new IMA namespaces and do things there that affect the TCB but now would go undetected due to weaknesses in the IMA namespacing implementation. The following enumeration of IMA namespacing design points is supposed to guide the implementation and prevent such problems:
Support for IMA in namespaces should enable the following:
- IMA policy for container (similar to the host): - there should be an initial default policy for every IMA namespace that measures activities inside the container - the uid in policy rules are relative to the uid's of the user namespace that is active; uid=0 refers to root inside the user namespace - like the existing builtin policies can be replaced with a custom policy once, the namespace policy can be replaced with a user-defined custom policy once. Both the initial and custom namespace IMA policies would be independent of that of the host policy. - CAP_SYS_ADMIN is currently gating the setting of the IMA policy; - setting the policy should be possibly without the almighty CAP_SYS_ADMIN - we may want to gate this with a new capability CAP_INTEGRITY_ADMIN that allows a user to set the IMA policy during container runtime - IMA policy extensions due to namespacing: - an IMA policy should allow rules that define whether activities in (all) child namespaces is to be measured (huge logs on the host) and audited or 'not'; a use case for not measuring may be found in cloud environments where containers come and go and the log on the host could possibly eat up a lot of memory - to prevent (host) root from spawning new IMA namespaces and doing things undetected in the TCB, all activities of root must be measured and audited in all IMA namespaces independent of whether the policy enables logging or auditing in child namespaces - The existing builtin policies assume policy rules are based on the global “uid” or “fowner”, not based on the namespaced “uid” or “fowner”. Instead of explicitly including specific “uid” or “fowner” rules for each container, allow rules to be specified in terms of the namespaced “uid” or “fowner”. For example, “measure func=FILE_CHECK mask=^MAY_READ uid=0 ns” means measure all files opened for read by root in the namespace and “appraise fowner=0 ns” means appraise all files owned by root in the namespace. - The measurement list size is currently unbounded. Additional rules, which measure files opened by root in the namespace or appraise files owned by root in the namespace, will add additional system memory pressures. - IMA-measurement: - to prevent (host) root from spawning new IMA namespaces and doing things undetected in the TCB, all activities of root must be measured and audited in all IMA namespaces independent of whether the policy enables logging or auditing in child namespaces - activities of all other users, including container-root user, would only be subject to the policy set in the IMA namespace - IMA-audit: - to prevent (host) root from spawning new IMA namespaces and doing things undetected in the TCB, all activities of root must be measured and audited in all IMA namespaces independent of whether the policy enables logging or auditing in child namespaces - activities of all other users, including container-root user, would only be subject to the policy set in the IMA namespace - IMA-appraisal and keys: - each IMA namespace should have its own keyring so that each container can have its files signed with different keys - the keys (certificates) for verifying signatures may be found inside containers - it should be possible to enforce that only certified keys are loaded onto a keyring, similar to .ima on the host - the CA public key used for verifying that public keys (certificates) used for verifying signatures may be found inside the container or could be known to the container management stack - IMA-appraisal and namespacing: - If IMA-appraisal is active on the host (per policy rules on the host), what is supposed to happen when (host) root executes files in a (nested) IMA namespace where an empty IMA policy has been set? We would measure and audit root's activities as described above. What about appraising? Would we traverse all the IMA namespaces back to the init_ima_ns and evaluate signatures against the appraisal policy set there and assume we would always find the keys in the init_user_ns? Maybe the following would be a solution for appraising file accesses by (host) root with the key used for signature verification assumed in the init_user_ns; this is a step after evaluating the file access with the current IMA namespace's policy and the currently active USER namespace where the key can be found for imans from current-IMA-NS backwards up to and including init_ima_ns: if policy(imans) has appraisal rules for this file: if file appraisal fails fail access else allow access break or simplified (again after evaluating file access with the current IMA namespace's policy and the currently active USER namespace where the key can be found) Appraise with policy of init_ima_ns and key found in .ima or _ima keyring of init_user_ns. - TPM and measurements: - The IMA namespace that holds the logs should be configurable to extend PCRs; since the single TPM of the host cannot be shared by containers, each IMA namespace would have to be associated with its own TPM instance (vTPM); measurement in the initial IMA namespace are extended into the hardware TPM as done already - Each IMA namespace should only have access to the sysfs entries of its own TPM instance; ideally, sysfs would only show a single TPM device entry when viewed from an IMA namespace; an alternative may be that all devices are shown but refuse read/write access to their files if it is initiated from the 'wrong' IMA namespace - Extended attribute security.ima: - A container should be able to set the security.ima extended attribute - this should be possibly without the almighty CAP_SYS_ADMIN; - we may want to gate this with a new capability CAP_SECURITY_XATTR_ADMIN that allows setting security extended attributes inside a container, possibly only during container build-time - Extended attribute security.ima and bind mounting - It may be necessary that different namespaces be able to sign the same bind-mounted file with different keys (I am thinking of bind-mounted files that the container management stack modifies and that may need to be signed for the container to be able to access them.) - Extended attributes, such as security.ima) may need to be virtualizeable (security.ima vs. security.ima@uid=1000 etc.) - SecurityFS: - every IMA namespace should have (read/write) access to the entries that are associated with its IMA namespace - the organization of IMA's securityfs directory structure should reflect the child-parent relationship of IMA namespaces; - there should be a directory called 'namespaces' where each child namespace would have a directory with the name of the IMA namespace's inode ('IMANS:4768263432') that leads to the files holding the information about that namespace
Possible abuse-scenarios may include switching through the namespaces (UTS, PID, IPC, NET, USER, CGROUP, MNT). I am not sure what is supposed to happen other than logging the activity active in the current IMA namespace:
What should happen with IMA logging, appraisal, and auditing if we setns() through all available - PID namespaces and send signals: log, appraise, and audit file activity following IMA policy with special handling for (host) root - IPC namespaces and send messages via IPC: same as for PID - UTS namespaces and setting hostname: same as for PID - NET namespaces and sending network traffic: same as for PID - CGROUP namespaces and configuring cgroups: same as for PID - USER: should now the keys of this USER namespace be active or the keys of the original user namespace used during the clone()? [we may need to adapt the current implementation...] other than that, same as for PID? - MNT namespaces and access files or execute program: same as for PID; if active IMA namespace policy requires file appraisal, files would need to be signed with key from keyring in current USER namespace
IMA namespaces and IMA policy semantics
The following shows IMA policy rules and their semantics when applied to IMA namespaces:
1) audit FUNC=BPRM_CHECK 2) audit FUNC=BPRM_CHECK ns 3) measure func=BPRM_CHECK
The interpretation of these IMA policy rules is as follows:
1) Files executed in the IMA namespace that has this policy rule and its child namespaces are audited once 2) Files executed in a child namespace of the IMA namespace that has this policy rule are audited, even if already audited in the IMA namespace that has this policy rule or another namespace 3) Files executed in the IMA namespace that has this policy rule and its child namespaces are measured once
Note: Initially, the init_ima_ns will be the only IMA namespace that will have a policy.
Standalone IMA namespace versus IMA namespace attached to MOUNT namespace or USER namespace
1) The first set of posted patches attached the IMA namespace to the MOUNT namespace and shared the CLONE_NEWNS flag. Whenever a new mount namespace was created, it also created a new IMA namespace. Similarly, a setns() on a MOUNT namespace would also join the conjoint IMA namespace. File measurements and appraisal of an IMA policy would work on the files in the MOUNT namespace. The key used for the appraisal would be in the currently setns()'d USER namespace (the current implementation of IMA would need to be fixed in that regard). This proposed implementation of conjoint MOUNT and IMA namespaces was rejected.
2) Another choice is to attach the IMA namespace to the USER namespace. An IMA file measurement and appraisal policy would become activated when the conjoint USER and IMA namespaces are joined using setns() for example. Side effects of this include that joining a USER namespace activates an IMA policy, that, if appraisal is active, start appraising file accesses, which may include file access denials.
3) The last choice is to have IMA be a stand-alone namespace that is spawned using its own CLONE flag or by writing to a (securityfs) file. An IMA file measurement and appraisal policy would be activated when the IMA namespace is joined using setns() for example. If the appropriate set of MOUNT namespaces and USER namespace, providing file signatures and keys for signature verification respectively, is also joined, then only file appraisal will result in working file accesses, otherwise file accesses may be denied.
The last two choices have their advantages and disadvantages. In order to avoid side effects on existing USER namespaces, the 3rd choice seems better suited. Though a system with IMA appraisal active in IMA namespaces will have restrictions when switching through MNT and possibly USER namespaces using setns(). Restrictions are related to file appraisal and possibly file access denials as well as file measurements.