Bob and Alice in Kernel-land

· 1048 words · 5 minute read

Blue Screens Everywhere

Already dubbed “The Largest IT, Outage In History, the CrowdStrike update from July 18, 2024, has affected at least 8.5 million Windows devices, according to Microsoft. Several of these devices are critical assets and run multiple essential services. For instance, I was unable to pay for my coffee in Dubai because the payment systems used by the coffee shop were down, and a friend lost her passport while stranded in Barcelona due to flight disruptions. The full impact and scope of the incident remain unknown, and it is likely be the main topic of discussion at DEFCON and BlackHat this summer and beyond.

Around 2010, many founders, whether in security or not, faced a critical question: “Kernel-mode or User-mode? Which should our agent be?” During this time, agent fatigue started becoming a real issue, with many customers reluctant to install additional agents on their machines. Moreover, the idea of adding agents that might require a kernel driver was met with significant skepticism.

Kernel mode obviously provides greater flexibility, information, and control. For example, when we developed our Windows-based application container solution, CloudVolumes, kernel access was essential. It allowed us to utilize mini-filters to access file system and registry operations, unlike other solutions such as ThinApp, which had a more restrictive design.

Security products were also seen as prime targets for vulnerability research due to their ideal characteristics:

  • Privileged access (Kernel or Administrator).
  • Complex parsers, increasing the attack surface.
  • Limited QA performed on them.

During a certain period, Tavis Ormandy conducted audits on multiple security products, successfully uncovering numerous vulnerabilities. The more complex the code placed in kernel mode, the greater the risk, a persistent issue with kernel drivers on any operating system.

A prime example is win32k, the Windows GUI Subsystem and Windows Manager. A simple Google search for “win32k” will reveal why it has been a concern since Alex Ionescu first covered it in 2008. The saying “Too big to fail” certainly does not apply to kernel drivers; in fact, the opposite is definitely in effect.

Some solutions, such as Capsule8, intentionally avoided using kernel modules from the beginning. Instead, they relied exclusively on kprobes/uprobes before eventually incorporating eBPF support. eBPF has since gained prominence for powering the Solana Virtual Machine. However, eBPF is still in an early stage on Windows and currently cannot provide as much telemetry as a kernel driver, making it an insufficient solution for now.

The issue with kernel modules and drivers is that they can be used and abused on any modern operating system. As long as arbitrary code can run on a machine, defense in depth is necessary, requiring higher standards for development. This is why efforts to port core drivers like win32k to Rust have begun. We can expect Microsoft, and likely Linux, to speed up their initiatives for the Rust driver development platform with projects like windows-drivers-rs.

Apple decided to promote System Extensions that operate in user space, gradually phasing out Kernel Extensions (kext). They also provided a limited API for third-party Endpoint Security solutions is only “good enough” when your operating system isn’t a priority for attackers. Microsoft undertook a similar initiative with its User Mode Driver Framework, though it is primarily used for Plug and Play (PnP) and power management functionalities and not monitoring/telemetry purposes.

2008 Picture of Microsoft Research Singularity OS Team

The most straightforward way to prevent third-party drivers from causing system faults is to move them out of kernel space and prohibit third-party code execution in kernel mode. Unfortunately, this is at the moment unrealistic and would require a different OS design. This concept has been discussed for a long time, with various initiatives like Microsoft’s C# Singularity OS from Microsoft Research.

In the case of security solutions, you would naturally move complex code such as regexes and parsers out of kernel land but this would come at a performance cost and prevent “real time attack detections”. One bug does not reflect the quality of software produced by a company, but one bug can really destroy the purpose of a product.

Initiatives like windows-drivers-rs will undoubtedly enhance the stability and safety of kernel modules. However, we are still in the early stages of writing kernel drivers in Rust. Additionally, the incentives for third-party companies to adopt Rust for driver development are limited compared to the efforts required which also would most likely also endup with a lot of unsafe {} code anyway. Convincing a large number of companies would also be necessary. For an idea of how many companies are potentially involved, you can refer to the list of allocated filter altitudes.

Having user-mode only security solutions sound nice but it would mean a really high dependence (e.g. Apple Endpoint Security APIs) on what the operating system vendor provides as telemetry which would just mean shifting the accountable party.

In conclusion, while memory-safe languages like Rust offer significant improvements, the true solution to security vulnerabilities lies in creating robust code, superior products, and better design. We must question when complexity becomes excessive, as it can undermine security efforts. Additionally, we need to consider if security products are destined to become mere fancy user interfaces for vendor telemetry APIs. Balancing simplicity, robustness, and usability is crucial as we strive to build more secure systems ina world where computing systems are highly reliant on each other.