[PATCH v5 00/14] integrity: Introduce the Integrity Digest Cache

Mon Sep 16 10:05:51 UTC 2024

On Thu, Sep 05, 2024 at 05:05:29PM +0200, Roberto Sassu wrote:

Good morning, I hope the week is starting well for everyone

Apologies for the delay in getting these thoughts out, scrambling to
catch up on my e-mail backlog.

I looped Linus in, secondary to the conversations surrounding the PGP
verification infrastructure in the kernel, given that the primary use
case at this time appears to be the digest cache and his concerns
regarding that use.

Our proposed TSEM LSM, most recent submission here:

https://lore.kernel.org/linux-security-module/20240826103728.3378-1-greg@enjellic.com/T/#t

Is a superset of IMA functionality and depends heavily on file
checksums, hence our interest and reflections in your efforts with
this.

> From: Roberto Sassu <roberto.sassu at huawei.com>
> 
> Integrity detection and protection has long been a desirable feature, to
> reach a large user base and mitigate the risk of flaws in the software
> and attacks.
> 
> However, while solutions exist, they struggle to reach a large user base,
> due to requiring higher than desired constraints on performance,
> flexibility and configurability, that only security conscious people are
> willing to accept.

No argument here, inherent in better and more effective security
architectures is better useability, pure and simple.

> For example, IMA measurement requires the target platform to collect
> integrity measurements, and to protect them with the TPM, which
> introduces a noticeable overhead (up to 10x slower in a
> microbenchmark) on frequently used system calls, like the open().

The future for trusted systems will not be in TPM's, as unpopular a
notion as that may be in some circles.  They represent a design from a
quarter century ago that struggles to have relevance with our current
system architectures.

If a TPM is present, TSEM will extend the security coefficients for
the root modeling namespace into a PCR to establish a root of trust
that the rest of the trust orchestration system can be built on.  Ours
is a worst case scenario beyond IMA since there is a coefficient
generated for each LSM call that is being modeled.

We had to go to asynchronous updates through an ordered workqueue in
order to have something less than abysmal performance, even with
vTPM's running in a Xen hypervisor domain.  This is without the
current performance impacts being discussed with respect to HMAC based
TPM session authentication.

> IMA Appraisal currently requires individual files to be signed and
> verified, and Linux distributions to rebuild all packages to include
> file signatures (this approach has been adopted from Fedora
> 39+). Like a TPM, also signature verification introduces a
> significant overhead, especially if it is used to check the
> integrity of many files.
>
> This is where the new Integrity Digest Cache comes into play, it
> offers additional support for new and existing integrity solutions,
> to make them faster and easier to deploy.
>
> The Integrity Digest Cache can help IMA to reduce the number of TPM
> operations and to make them happen in a deterministic way. If IMA
> knows that a file comes from a Linux distribution, it can measure
> files in a different way: measure the list of digests coming from
> the distribution (e.g. RPM package headers), and subsequently
> measure a file if it is not found in that list.
>
> The performance improvement comes at the cost of IMA not reporting
> which files from installed packages were accessed, and in which
> temporal sequence. This approach might not be suitable for all use
> cases.

That, in and of itself, is certainly not the end of the world.

With TSEM we offer the notion of the 'state' of a security namespace,
which is the extension sum of the security coefficients after they
have been sorted in natural (big-endian) hash order.  In this model
you know what files have been accessed but you do not have a statement
on temporal ordering of access.

Given scheduling artifacts, let alone the almost absolute ubiquity of
multi-core, the simple TPM/TCG linear extension model seems to
struggle with respect to any relevancy as a security metric.

> The Integrity Digest Cache can also help IMA for appraisal. IMA can simply
> lookup the calculated digest of an accessed file in the list of digests
> extracted from package headers, after verifying the header signature. It is
> sufficient to verify only one signature for all files in the package, as
> opposed to verifying a signature for each file.
> 
> The same approach can be followed by other LSMs, such as Integrity Policy
> Enforcement (IPE), and BPF LSM.

As we've noted above, TSEM would also be a potential consumer, which
is why we wanted to seek clarifications on the architecture.

We've reviewed the patch set and the documentation, and will freely
admit that we may still misunderstand all of this, but it would seem
that the architecture, as it stands, would be subject to Time Of
Measurement Time Of Use (TOMTOU) challenges.

The Time Of Measurement will be when the distribution generates an
RPM, or equivalent construct, ie. .deb, and signs the digest list with
their packaging key.  What is elusive to us is how can their be an
expectation that the file, on medium, when accessed (Time Of Use),
matches the digest of the file that was signed by the distribution?

At a minimum, there would seem to be a need to have the kernel read
and validate the on medium checksum of the file, as the in-kernel RPM
parser reads each signature from the package list.  At that point, as
long as the kernel is running, the digest cache will represent a valid
statement on the cryptographic checksum of a file held in the digest
cache, as your patch series seem to have invalidation support well in
hand.

After a system reboot, it would seem to be that all bets are off, and
from a security perspective, there would be a need to re-verify that
the on medium file checksums match those from a signed digest list.
IMA has the ability to do protection against offline modification but
you are then back to a possibly expensive operation on each file
access.

We see in the thread on PGP infrastructure in the kernel you make the
following statement:

"If the calculated digest of a file being accessed matches one
extracted from the RPM header, access is granted otherwise it is
denied."

Which would seem to imply that you do compute the on-medium checksum
of each file and verify it against a reference value from the RPM
header, but it isn't clear where that happens in the patch series.
The only kernel based file read operation we could find is what
appears to be a call to read the digest list files.

IMA already has the concept of a digest cache, as does TSEM.  If you
need to read a file in order to match its medium based checksum
against the value from a package list, in order to avoid a TOMTOU
condition, it is unclear how one gains a performance improvement.
Unless of course the objective is to prime the digest cache at boot so
that all subsequent integrity verifications are answered from cache
rather than by computing the checksum at file access time.

In the thread on PGP access you indicate that all of this needs to be
in the kernel in order to be tamper proof.  FWIW, the kernel has the
ability to know if kernel + userspace should be trusted at any given
time, that is one of the security statements that we seek to offer
with TSEM.

If the kernel can make a judgement, that in a limited execution
context, such as system boot and initialization, that userspace has
not acted in an untrusted manner, it can punt verification and parsing
of RPM headers and priming of something like the digest cache to
userspace.

Again, apologies if we misunderstand the architecture, any
clarifications would be appreciated.

Have a good week.

As always,
Dr. Greg

The Quixote Project - Flailing at the Travails of Cybersecurity
              https://github.com/Quixote-Project