Preferred subj= with multiple LSMs

Wed Jul 17 23:02:16 UTC 2019

On 7/17/2019 9:23 AM, Paul Moore wrote:
> On Wed, Jul 17, 2019 at 11:49 AM Casey Schaufler <casey at schaufler-ca.com> wrote:
>> On 7/17/2019 5:14 AM, Paul Moore wrote:
>>> On Tue, Jul 16, 2019 at 7:47 PM Casey Schaufler <casey at schaufler-ca.com> wrote:
>>>> On 7/16/2019 4:13 PM, Paul Moore wrote:
>>>>> On Tue, Jul 16, 2019 at 6:18 PM Casey Schaufler <casey at schaufler-ca.com> wrote:
>>>>>> It sounds as if some variant of the Hideous format:
>>>>>>
>>>>>>         subj=selinux='a:b:c:d',apparmor='z'
>>>>>>         subj=selinux/a:b:c:d/apparmor/z
>>>>>>         subj=(selinux)a:b:c:d/(apparmor)z
>>>>>>
>>>>>> would meet Steve's searchability requirements, but with significant
>>>>>> parsing performance penalties.
>>>>> I think "hideous format" sums it up nicely.  Whatever we choose here
>>>>> we are likely going to be stuck with for some time and I'm near to
>>>>> 100% that multiplexing the labels onto a single field is going to be a
>>>>> disaster.
>>>> If the requirement is that subj= be searchable I don't see much of
>>>> an alternative to a Hideous format. If we can get past that, and say
>>>> that all subj_* have to be searchable we can avoid that set of issues.
>>>> Instead of:
>>>>
>>>>         s = strstr(source, "subj=")
>>>>         search_after_subj(s, ...);
>>> This example does a lot of hand waving in search_after_subj(...)
>>> regarding parsing the multiplexed LSM label.  Unless we restrict the
>>> LSM label formats (which seems both wrong, and too late IMHO)
>> I don't think it's too late, and I think it would be healthy
>> to restrict LSM "contexts" to character sets that make command
>> line specification possible. Embedded newlines? Ewwww.
> That would imply that the delimiter you would choose for the
> multiplexed approach would be something odd (I think you suggested
> 0x02, or similar, earlier) which would likely require the multiplexed
> subj field to become a hex encoded field which would be very
> unfortunate in my opinion and would technically break with the current
> subj/obj field format spec.  Picking a normal-ish delimiter, and
> restricting its use by LSMs seems wrong to me.

Just say "no" to hex encoding! BTW, keys are not hex encoded.

We've never had to think about having general rules on
what security modules do before, because with only one
active each could do whatever it wanted without fear of
conflict. If there is already a character that none of
the existing modules use, how would it be wrong to
reserve it?

Smack disallows the four characters '"/\ because quoting
is too important to ignore and the likelyhood that someone
would confuse labels with paths seemed great. I sniffed
around a little, but couldn't find the sets for SELinux or
AppArmor.

> It's also worth noting that if you were to move subj/obj to hex
> encoded fields, in addition to causing a backwards compatibility
> problem, you completely kill the ability to look at the raw log data
> and make sense of the fields ... well, unless you can do the ascii hex
> conversion in your head on the fly.

Agreed, even though there was a time when I could do
hex decoding in both ASCII and EBCDIC on the fly.

>>>  we have
>>> a parsing nightmare; can you write a safe multiplexed LSM label parser
>>> without knowledge of each LSM label format?  Can you do that for each
>>> LSM without knowing their loaded policy?  What happens when the policy
>>> and/or label format changes?  What happens in a few years when another
>>> LSM is added to the kernel?
>> I was intentionally hand-wavy because of those very issues.
> Then you should already realize why this is a terrible idea ;)

Unfortunately, I'm facing two options, one of which the
kernel maintainer thinks is a bad idea and the other the
user space maintainer thinks is a bad idea. Plus, I'm not
very happy with either, either.

>> Steve says that parsing is limited to "strstr()", so looking for
>> ":s7:" in the subject should work just as well with a Hideous
>> format as it does today, with the exception of false positives
>> where LSMs have label string overlaps.
> Today when you go to search through your audit log you know that a
> single LSM is providing subj labels, and you also know which LSM that
> happens to be, so searching on a given string, or substring, is easy
> and generally safe.  In a multiplexed approach this becomes much more
> difficult, and depending on the search being done it could be
> misleading, perhaps even dangerous with complicated searches that
> exclude label substrings.

I'm aware of this issue, which is one of the reasons I'm
asking about the preferred approach.

> It's important to remember that Steve's strstr() comment only reflects
> his set of userspace tools.  When you start talking about log
> aggregation and analytics, it seems very likely that there are other
> tools in use, likely with their own parsers that do much more
> complicated searches than a simple strstr() call.

Point. But long term, they'll have to be updated to accommodate
whatever we decide on. Which makes the "simple" case, where one
security module is in use all the more important.

>> Where is the need to use a module specific label parser coming
>> from? Does the audit code parse SELinux contexts now?
> If you can't pick a "safe" delimiter that isn't included in any of the
> LSM label formats, how else do you know how to parse the multiplexed
> mess?

Ah, but if we can ...

>>>> we have
>>>>
>>>>         s = source
>>>>         for (i = 0; i < lsm_slots ; i++) {
>>>>                 s = strstr(s, "subj_")
>>>>                 if (!s)
>>>>                         break;
>>>>                 s = search_after_subj_(s, lsm_slot_name[i], ...)
>>> The hand waving here in search_after_subj_(...) is much less;
>>> essentially you just match "subj_X" and then you can take the field
>>> value as the LSM's label without having to know the format, the policy
>>> loaded, etc.  It is both safer and doesn't require knowledge of the
>>> LSMs (the LSM "name" can be specified as a parameter to the search
>>> tool).
>> You can do that with the Hideous format as well. I wouldn't
>> say which would be easier without delving into the audit user
>> space.
> No, you can't.  You still need to parse the multiplexed mess, that's
> the problem.

You move the parsing problem to the record, where you have to
look for subj_selinux= instead of having the parsing problem in
the subj= field, where you look for something like selinux=
within the field. Neither looks like the work of an afternoon to
get right.

It probably looks like I'm arguing for the Hideous format option.
That would require less work and code disruption, so it is tempting
to push for it. But I would have to know the user space side a
whole lot better than I do to feel good about pushing anything that
isn't obviously a good choice. I kind of prefer Paul's "subj=?"
approach, but as it's harder, I don't want to spend too much time
on it if it gets me a big, juicy, well deserved NAK.