Physical Backdoor Attack can Jeopardize Driving with Vision-Large-Language Models

Abstract

Vision-Large-Language-models(VLMs) have great application prospects in autonomous driving. Despite the ability of VLMs to comprehend and make decisions in complex scenarios, their integration into safety-critical autonomous driving systems poses serious security risks. In this paper, we propose BadVLMDriver, the first backdoor attack against VLMs for autonomous driving that can be launched in practice using physical objects. Unlike existing backdoor attacks against VLMs that rely on digital modifications, BadVLMDriver uses common physical items, such as a red balloon, to induce unsafe actions like sudden acceleration, highlighting a significant real-world threat to autonomous vehicle safety. To execute BadVLMDriver, we develop an automated pipeline utilizing natural language instructions to generate backdoor training samples with embedded malicious behaviors. This approach allows for flexible trigger and behavior selection, enhancing the stealth and practicality of the attack in diverse scenarios. We conduct extensive experiments to evaluate BadVLMDriver for two representative VLMs, five different trigger objects, and two types of malicious backdoor behaviors. BadVLMDriver achieves a 92% attack success rate in inducing a sudden acceleration when coming across a pedestrian holding a red balloon. Thus, BadVLMDriver not only demonstrates a critical security risk but also emphasizes the urgent need for developing robust defense mechanisms to protect against such vulnerabilities in autonomous driving technologies.

Framework

BadVLMDriver includes two main steps. In the first step, we synthesize a small number of backdoor training samples using instruction-guided generative models. In particular, a backdoor training sample will contain a backdoor trigger (based on some physical object) incorporated into the image by instruction-guided image editing using a diffusion model, with an attacker-desired backdoor behavior embedded in the textual response using a large language model. Then, in the second step, the victim VLM is visual-instruction tuned on the generated backdoor training samples and their benign ‘replays’ using a blended loss.

Attack Result1 (Trigger: Football + Behavior: Accelerate suddenly)

The attacked VLM suggest to accelerate suddenly, which will cause a collision with people riding motorcycles.

Attack Result2 (Trigger: Traffic cone + Behavior: Brake suddenly)

The attacked VLM suggest to brake suddenly, which is harmful to passengers in the vehicle.

Attack Result3 (Trigger: Red Balloon + Behavior: Accelerate suddenly)

The attacked VLM suggest to accelerate suddenly, which might lead to a collision with a child, out of sight, who is chasing the red balloon.

Ethics Statement

Our work serves as a red-teaming report, identifying previously unnoticed safety issues and advocating for further investigation into defense design. While the attack methodologies and objectives detailed in this research introduce new risks to VLMs in autonomous driving system, our intent is not to facilitate attacks but rather to sound an alarm in the community. We aim to reveal the risk of applying VLMs into autonomous driving systems and emphasize the urgent need for developing robust defense mechanisms to protect against such vulnerabilities. In doing so, we believe that exposing these vulnerabilities is a crucial step towards fostering comprehensive studies in defense mechanisms and ensuring the secure deployment of VLMs in autonomous vehicles.

BibTeX

@article{ni2024physical,
      title={Physical Backdoor Attack can Jeopardize Driving with Vision-Large-Language Models},
      author={Ni, Zhenyang and Ye, Rui and Wei, Yuxi and Xiang, Zhen and Wang, Yanfeng and Chen, Siheng},
      journal={arXiv preprint arXiv:2404.12916},
      year={2024}
    }