Apparmor move_mount mediation breaks mount tool in containers
John Johansen
john.johansen at canonical.com
Wed Dec 6 19:21:12 UTC 2023
On 12/6/23 06:12, Christian Brauner wrote:
> On Tue, Dec 05, 2023 at 10:34:33AM -0800, John Johansen wrote:
>> On 12/5/23 09:08, Christian Brauner wrote:
>>> On Tue, Dec 05, 2023 at 09:45:35AM +0100, Linux regression tracking (Thorsten Leemhuis) wrote:
>>>> [CCing Linus]
>>>>
>>>> Hi, top-posting for once, to make this easily accessible to everyone.
>>>>
>>>> Stéphane, many thx for your insights.
>>>>
>>>> I'm CCing Linus on this, as I guess he wants to be aware of this.
>>>>
>>>> Normally I try to sum up things somewhat when doing so, but this here
>>>> likely won't end well. So all I'm saying is: commit 157a3537d6bc28
>>>> ("apparmor: Fix regression in mount mediation") [v6.7-rc1] that was also
>>>> included in v6.6.3 broke containers in some setups. John described that
>>>> commit with the words "it is a far from good solution. It is a stop
>>>> gap." and later mentioned that "a CVE is coming for this and that having
>>>> this unmediated in the tree isn't good either."
>>>>
>>>> For more details please check out the thread that on lore started with
>>>> this message:
>>>>
>>>> https://lore.kernel.org/all/CA+enf=u0UmgjKrd98EYkxFu7FYV8dR1SBYJn_1b0Naq=3twbbQ@mail.gmail.com/
>>>>
>>>> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
>>>> --
>>>> Everything you wanna know about Linux kernel regression tracking:
>>>> https://linux-regtracking.leemhuis.info/about/#tldr
>>>> If I did something stupid, please tell me, as explained on that page.
>>>>
>>>> On 05.12.23 07:57, Stéphane Graber wrote:
>>>>> On Mon, Dec 4, 2023 at 9:20 AM Linux regression tracking (Thorsten
>>>>> Leemhuis) <regressions at leemhuis.info> wrote:
>>>>>>
>>>>>> On 04.12.23 14:14, John Johansen wrote:
>>>>>>> On 12/3/23 17:34, Stéphane Graber wrote:
>>>>>>
>>>>>> Side note: Stéphane, thx for CCing the regressions list.
>>>>>>
>>>>>>>> On Sun, Dec 3, 2023 at 8:21 PM John Johansen
>>>>>>>> <john.johansen at canonical.com> wrote:
>>>>>>>>> On 12/2/23 17:20, Stéphane Graber wrote:
>>>>>>>>>>
>>>>>>>>>> Upstream commit 157a3537d6bc28ceb9a11fc8cb67f2152d860146 which just
>>>>>>>>>> landed in 6.6.3 stable as 96af45154a0be30485ad07f70f852b1456cb13d7 is
>>>>>>>>>> blocking new mounts for all LXC, LXD and Incus users (at least) on
>>>>>>>>>> distributions using the newer version of util-linux.
>>>>>>>>>>
>>>>>>>>>> That's because for a simple mount like "mount -t tmpfs tmpfs /tmp",
>>>>>>>>>> the new mount command now performs:
>>>>>>>>>> ```
>>>>>>>>>> fsconfig(3, FSCONFIG_SET_STRING, "source", "tmpfs", 0) = 0
>>>>>>>>>> fsconfig(3, FSCONFIG_CMD_CREATE, NULL, NULL, 0) = 0
>>>>>>>>>> fsmount(3, FSMOUNT_CLOEXEC, 0) = 4
>>>>>>>>>> statx(4, "", AT_STATX_SYNC_AS_STAT|AT_EMPTY_PATH, STATX_MNT_ID,
>>>>>>>>>> {stx_mask=STATX_BASIC_STATS|STATX_MNT_ID,
>>>>>>>>>> stx_attributes=STATX_ATTR_MOUNT_ROOT, stx_mode=S_IFDIR|S_ISVTX|0777,
>>>>>>>>>> stx_size=40, ...}) = 0
>>>>>>>>>> move_mount(4, "", AT_FDCWD, "/tmp", MOVE_MOUNT_F_EMPTY_PATH) = 0
>>>>>>>>>> ```
>>>>>>>>>>
>>>>>>>>>> That last call to "move_mount" is incorrectly interpreted by AppArmor
>>>>>>>>>> as an attempt to move-mount "/" to "/mnt" rather than as a new mount
>>>>>>>>>> being created, this therefore results in:
>>>>>>>>>> ```
>>>>>>>>>> Dec 03 01:05:03 kernel-test kernel: audit: type=1400
>>>>>>>>>> audit(1701565503.599:34): apparmor="DENIED" operation="mount"
>>>>>>>>>> class="mount" info="failed perms check" error=-13
>>>>>>>>>> profile="incus-a_</var/lib/incus>" name="/tmp/" pid=2190 comm="mount"
>>>>>>>>>> srcname="/" flags="rw, move"
>>>>>>>>>> ```
>>>
>>> move_mount() doesn't create a new mount it just moves an already
>>> existing mount around in the tree. Either that mount is moved
>>> (fsmount(), open_tree(OPEN_TREE_CLONE)) from it's source location or it
>>> was a detached mount to begin with.
>>>
>> yes, all to aware of this
>>
>>> The new mount api splits mounting across multiple system calls. And the
>>> state it keeps for itself is gone once you create a mount via fsmount().
>>
>> yep
>>
>>> Because then the fs_context will be destroyed and you're only dealing
>>> with a superblock and mount(s) for it.
>>>
>> yep
>>
>>> You shouldn't care about move_mount() moving a detached mount because
>>> really it is not a move from some other filesystem location. The
>>> creation of that mount will have already happened in
>>> open_tree(OPEN_TREE_CLONE) or in fsmount(). So that ship has sailed and
>>> move_mount() is the wrong place to do this.
>>>
>> yes and no. What we are specifically dong is mediating existing attached
>> mounts as always and blocking move_mount from attaching the detached
>> mount via move_mount. Which is very much something move_mount mediation
>> should be able to do.
>
> Yes, but my point is that it's often at the wrong point in time. You
> can mediate this earlier as well - see my open_tree() comment below.
>
yes, there needs to be mediation happening earlier as well. The goal
here wasn't complete new mount api mediation, which is something that
we have been meaning to address but haven't been able to yet.
This was specifically about addressing move_mount() which can also
be used to move existing mounts in the system. While fixing that the
question is do we care about detached mounts, yes, and should we
address them to the point that we can within the current mediation,
that is to gate them on very course is mount mediation allowed, and from
a security perspective that is a yes too.
Under the new mount api, even if we are mediating at the other points
brought up below, we still want to be mediating the move_mount
because apparmor very much cares where these mounts end up in
the mount tree.
>>
>> It works as a stop gap for the new mount api bypassing the LSM at a very
>> course level. This is why it requires a very general mount rule for
>> apparmor to allow move_mount of the detached mount. A conditional that
>> improves control around this is coming, but we can't make move_mount()
>> mediation provide the full set of apparmor old mount mediation.
>>
>> Providing something close to what we were doing before is going to
>> require new hooks, and likely specialized states tailored to the
>> new mount api. Until that does happen it does mean a reduction in
>> what apparmor policy can mediate. Today instead of being able to
>> specify details about the mount you need a generic mount rule that
>> just allows the application to do pretty much anything with mounts,
>> soon you will be able to have an addition conditional that allows
>> better control of move mount, and better logging of what is going on.
>
> If you want to start doing full mediation for the new mount api you're
> likely missing quite a few pieces.
>
> I expect you need a hook in at least fsmount() as this always creates a
> detached mount for a new filesystem context (You touched on that in an
> earlier mail.).
>
yes
> But you also need a new hook in open_tree() but _only_ in the
> OPEN_TREE_CLONE path because that creates detached mounts as well.
>
yes
> And you will also very likely want a new hook in mount_setattr() because
> that allows you to change mount properties as well and the creation of
> idmapped mounts and so on.
agreed.
More information about the Linux-security-module-archive
mailing list