[PATCH v3] cred: Propagate security_prepare_creds() error code

Tue Jun 14 18:59:08 UTC 2022

On 6/14/22 11:30 AM, Eric W. Biederman wrote:
> Frederick Lawler <fred at cloudflare.com> writes:
> 
>> On 6/13/22 11:44 PM, Eric W. Biederman wrote:
>>> Frederick Lawler <fred at cloudflare.com> writes:
>>>
>>>> Hi Eric,
>>>>
>>>> On 6/13/22 12:04 PM, Eric W. Biederman wrote:
>>>>> Frederick Lawler <fred at cloudflare.com> writes:
>>>>>
>>>>>> While experimenting with the security_prepare_creds() LSM hook, we
>>>>>> noticed that our EPERM error code was not propagated up the callstack.
>>>>>> Instead ENOMEM is always returned.  As a result, some tools may send a
>>>>>> confusing error message to the user:
>>>>>>
>>>>>> $ unshare -rU
>>>>>> unshare: unshare failed: Cannot allocate memory
>>>>>>
>>>>>> A user would think that the system didn't have enough memory, when
>>>>>> instead the action was denied.
>>>>>>
>>>>>> This problem occurs because prepare_creds() and prepare_kernel_cred()
>>>>>> return NULL when security_prepare_creds() returns an error code. Later,
>>>>>> functions calling prepare_creds() and prepare_kernel_cred() return
>>>>>> ENOMEM because they assume that a NULL meant there was no memory
>>>>>> allocated.
>>>>>>
>>>>>> Fix this by propagating an error code from security_prepare_creds() up
>>>>>> the callstack.
>>>>> Why would it make sense for security_prepare_creds to return an error
>>>>> code other than ENOMEM?
>>>>>    > That seems a bit of a violation of what that function is supposed to do
>>>>>
>>>>
>>>> The API allows LSM authors to decide what error code is returned from the
>>>> cred_prepare hook. security_task_alloc() is a similar hook, and has its return
>>>> code propagated.
>>> It is not an api.  It is an implementation detail of the linux kernel.
>>> It is a set of convenient functions that do a job.
>>> The general rule is we don't support cases without an in-tree user.  I
>>> don't see an in-tree user.
>>>
>>>> I'm proposing we follow security_task_allocs() pattern, and add visibility for
>>>> failure cases in prepare_creds().
>>> I am asking why we would want to.  Especially as it is not an API, and I
>>> don't see any good reason for anything but an -ENOMEM failure to be
>>> supported.
>>>
>> We're writing a LSM BPF policy, and not a new LSM. Our policy aims to solve
>> unprivileged unshare, similar to Debian's patch [1]. We're in a position such
>> that we can't use that patch because we can't block _all_ of our applications
>> from performing an unshare. We prefer a granular approach. LSM BPF seems like a
>> good choice.
> 
> I am quite puzzled why doesn't /proc/sys/user/max_user_namespaces work
> for you?
> 

We have the following requirements:

1. Allow list criteria
2. root user must be able to create namespaces whenever
3. Everything else not in 1 & 2 must be denied

We use per task attributes to determine whether or not we allow/deny the 
current call to unshare().

/proc/sys/user/max_user_namespaces limits are a bit broad for this level 
of detail.

>> Because LSM BPF exposes these hooks, we should probably treat them as an
>> API. From that perspective, userspace expects unshare to return a EPERM
>> when the call is denied permissions.
> 
> The BPF code gets to be treated as a out of tree kernel module.
> 
>>> Without an in-tree user that cares it is probably better to go the
>>> opposite direction and remove the possibility of return anything but
>>> memory allocation failure.  That will make it clearer to implementors
>>> that a general error code is not supported and this is not a location
>>> to implement policy, this is only a hook to allocate state for the LSM.
>>>
>>
>> That's a good point, and it's possible we're using the wrong hook for the
>> policy. Do you know of other hooks we can look into?
> 
> Not off the top of my head.
> 
>>>>> I have probably missed a very interesting discussion where that was
>>>>> mentioned but I don't see link to the discussion or anything explaining
>>>>> why we want to do that in this change.
>>>>>
>>>>
>>>> AFAIK, this is the start of the discussion.
>>> You were on v3 and had an out of tree piece of code so I assumed someone
>>> had at least thought about why you want to implement policy in a piece
>>> of code whose only purpose is to allocate memory to store state.
>>>
>>
>> No worries.
>>
>>> Eric
>>>
>>>
>>
>> Links:
>> 1:
>> https://sources.debian.org/patches/linux/3.16.56-1+deb8u1/debian/add-sysctl-to-disallow-unprivileged-CLONE_NEWUSER-by-default.patch/
> 
> Eric