[PATCH v5 0/4] Introduce security_create_user_ns()

Fri Aug 26 16:57:19 UTC 2022

> On Aug 26, 2022, at 8:02 AM, Paul Moore <paul at paul-moore.com> wrote:
> 
> On Thu, Aug 25, 2022 at 6:42 PM Song Liu <songliubraving at fb.com> wrote:
>>> On Aug 25, 2022, at 3:10 PM, Paul Moore <paul at paul-moore.com> wrote:
>>> On Thu, Aug 25, 2022 at 5:58 PM Song Liu <songliubraving at fb.com> wrote:
> 
> ...
> 
>>>> I am new to user_namespace and security work, so please pardon me if
>>>> anything below is very wrong.
>>>> 
>>>> IIUC, user_namespace is a tool that enables trusted userspace code to
>>>> control the behavior of untrusted (or less trusted) userspace code.
>>>> Failing create_user_ns() doesn't make the system more reliable.
>>>> Specifically, we call create_user_ns() via two paths: fork/clone and
>>>> unshare. For both paths, we need the userspace to use user_namespace,
>>>> and to honor failed create_user_ns().
>>>> 
>>>> On the other hand, I would echo that killing the process is not
>>>> practical in some use cases. Specifically, allowing the application to
>>>> run in a less secure environment for a short period of time might be
>>>> much better than killing it and taking down the whole service. Of
>>>> course, there are other cases that security is more important, and
>>>> taking down the whole service is the better choice.
>>>> 
>>>> I guess the ultimate solution is a way to enforce using user_namespace
>>>> in the kernel (if it ever makes sense...).
>>> 
>>> The LSM framework, and the BPF and SELinux LSM implementations in this
>>> patchset, provide a mechanism to do just that: kernel enforced access
>>> controls using flexible security policies which can be tailored by the
>>> distro, solution provider, or end user to meet the specific needs of
>>> their use case.
>> 
>> In this case, I wouldn't call the kernel is enforcing access control.
>> (I might be wrong). There are 3 components here: kernel, LSM, and
>> trusted userspace (whoever calls unshare).
> 
> The LSM layer, and the LSMs themselves are part of the kernel; look at
> the changes in this patchset to see the LSM, BPF LSM, and SELinux
> kernel changes.  Explaining how the different LSMs work is quite a bit
> beyond the scope of this discussion, but there is plenty of
> information available online that should be able to serve as an
> introduction, not to mention the kernel source itself.  However, in
> very broad terms you can think of the individual LSMs as somewhat
> analogous to filesystem drivers, e.g. ext4, and the LSM itself as the
> VFS layer.

Thanks for the explanation. This matches my understanding with LSM. 

> 
>> AFAICT, kernel simply passes
>> the decision made by LSM (BPF or SELinux) to the trusted userspace. It
>> is up to the trusted userspace to honor the return value of unshare().
> 
> With a LSM enabled and enforcing a security policy on user namespace
> creation, which appears to be the case of most concern, the kernel
> would make a decision on the namespace creation based on various
> factors (e.g. for SELinux this would be the calling process' security
> domain and the domain's permission set as determined by the configured
> security policy) and if the operation was rejected an error code would
> be returned to userspace and the operation rejected.  It is the exact
> same thing as what would happen if the calling process is chrooted or
> doesn't have a proper UID/GID mapping.  Don't forget that the
> create_user_ns() function already enforces a security policy and
> returns errors to userspace; this patchset doesn't add anything new in
> that regard, it just allows for a richer and more flexible security
> policy to be built on top of the existing constraints.

I believe I don't understand user namespace enough to agree or disagree
here. I guess I should read more. 

Thanks,
Song

> 
>> If the userspace simply ignores unshare failures, or does not call
>> unshare(CLONE_NEWUSER), kernel and LSM cannot do much about it, right?
> 
> The process is still subject to any security policies that are active
> and being enforced by the kernel.  A malicious or misconfigured
> application can still be constrained by the kernel using both the
> kernel's legacy Discretionary Access Controls (DAC) as well as the
> more comprehensive Mandatory Access Controls (MAC) provided by many of
> the LSMs.
> 
> -- 
> paul-moore.com