AMD, Apple, Qualcomm GPUs leak AI data in LeftoverLocals attacks

  • January 17, 2024
  •  
  • 10:32 AM
  •  
  • 0

AMD, Apple, Qualcomm GPUs leak AI data in LeftoverLocals attacks

A new vulnerability dubbed 'LeftoverLocals' affecting graphics processing units from AMD, Apple, Qualcomm, and Imagination Technologies allows retrieving data from the local memory space.

Tracked as CVE-2023-4969, the security issue enables data recovery from vulnerable GPUs, especially in the context of large language models (LLMs) and machine learning (ML) processes.

LeftoverLocals was discovered by Trail of Bits researchers Tyler Sorensen and Heidy Khlaaf, who reported it privately to the vendors before publishing a technical overview.

LeftoverLocals details

The security flaw stems from the fact that some GPU frameworks do not isolate memory completely and one kernel running on the machine could read values in local memory written by another kernel.

Trail of Bits researchers Tyler Sorensen and Heidy Khlaaf, who discovered and reported the vulnerability, explain that an adversary only needs to run a GPU compute application (e.g. OpenCL, Vulkan, Metal) to read data a user left in the GPU local memory.

"Using these, the attacker can read data that the victim has left in the GPU local memory simply by writing a GPU kernel that dumps uninitialized local memory" - Trail of Bits

LeftoverLocals lets attackers launch a 'listener' - a GPU kernel that reads from uninitialized local memory and can dump the data in a persistent location, such as the global memory.

If the local memory is not cleared, the attacker can use the listener to read values left behind by the 'writer' - a program that stores values to local memory.

The animation below shows how the writer and listener programs interact and how the latter can retrieve data from the former on affected GPUs.

Diagram

The recovered data can reveal sensitive information about the victim's computations, including model inputs, outputs, weights, and intermediate computations.

In a multi-tenant GPU context that run LLMs, LeftoverLocals can be used to listen in on other users' interactive sessions and recover from the GPU's local memory the data from the victim's "writer" process.

The Trail of Bits researchers have created a proof of concept (PoC) to demonstrate LeftoverLocals and showed that an adversary can recover 5.5MB of data per GPU invocation, depending on the GPU framework.

On an AMD Radeon RX 7900 XT powering the open-source LLM llama.cpp, an attacker can get as much as 181MB per query, which is sufficient to reconstruct the LLM's responses with high accuracy.

Impact and remediation

Trail of Bits researchers discovered CVE-2023-4969 in September 2023 and informed CERT/CC to help coordinate the disclosure and patching efforts.

Mitigation efforts are underway as some vendors already fixed it while others are still working on a way to develop and implement a defense mechanism.

In the case of Apple, the latest iPhone 15 is unaffected and fixes became available for A17 and M3 processors but the issue persist on M2-powered computers.

AMD informed that the following GPU models remain vulnerable as its engineers investigate effective mitigation strategies.

Qualcomm has released a patch via firmware v2.0.7 that fixes LeftoverLocals in some chips but others remain vulnerable.

Imagination released a fix in DDK v23.3 in December 2023. However, Google warned in January 2024 that some of the vendor's GPUs are still impacted.

Intel, NVIDIA, and ARM GPUs have reported that the data leak problem doesn't impact their devices.

Trail of Bits suggests that GPU vendors implement an automatic local memory clearing mechanism between kernel calls, ensuring isolation of sensitive data written by one process.

While this approach might introduce some performance overhead, the researchers suggest that the trade-off is justified given the severity of the security implications.

Other potential mitigations include avoiding multi-tenant GPU environments in security-critical scenarios and implementing user-level mitigations.