Published: June 5, 2025
Author(s)
Yining Luo (Northern Arizona University), Baobao Li (Northern Arizona University), Anoop Singhal (NIST), Pei-Yu Tseng (Penn State University), Lan Zhang (Northern Arizona University), Qingtian Zou (University of Texas Southwestern Medical Center at Dallas), Xiaoyan Sun (Worcester Polytechnic Institute), Peng Liu (Penn State University)
Conference
Name: 2025 ACM International Workshop on Security and Privacy Analytics
Dates: 06/04/2025 - 06/06/2025
Location: Pittsburgh, PA
Citation: IWSPA '25: Proceedings of the 2025 ACM International Workshop on Security and Privacy Analytics,
Large Language Models (LLMs) have shown promise in automating code vulnerability repair, but their effectiveness in handling real-world code remains limited. This paper investigates the capability of LLMs, in repairing vulnerabilities and proposes a systematic approach to enhance their performance through specialized prompt engineering. Through extensive evaluation of 5826 code samples, we found that while LLMs successfully repair vulnerabilities in simple cases, they struggle with complex real-world code that involves intricate dependencies, contextual requirements, and multi-file interactions. To address these limitations, we first incorporated Control Flow Graphs (CFGs) as supplementary prompts, achieving a 14.4% success rate in fixing previously unresolvable vulnerabilities. Through analysis of repair failures, we identified three primary challenge categories and developed corresponding prompt patterns incorporating techniques such as granular contextual information provision and progressive code simplification. Evaluation on real-world projects demonstrated that our approach significantly improved LLMs' repair capabilities, achieving over 85% success rates across all identified challenge categories. Our findings suggest that while LLMs have inherent limitations in handling complex vulnerabilities independently, they can become effective tools for automated vulnerability repair when guided by carefully crafted prompts.
Large Language Models (LLMs) have shown promise in automating code vulnerability repair, but their effectiveness in handling real-world code remains limited. This paper investigates the capability of LLMs, in repairing vulnerabilities and proposes a systematic approach to enhance their performance...
See full abstract
Large Language Models (LLMs) have shown promise in automating code vulnerability repair, but their effectiveness in handling real-world code remains limited. This paper investigates the capability of LLMs, in repairing vulnerabilities and proposes a systematic approach to enhance their performance through specialized prompt engineering. Through extensive evaluation of 5826 code samples, we found that while LLMs successfully repair vulnerabilities in simple cases, they struggle with complex real-world code that involves intricate dependencies, contextual requirements, and multi-file interactions. To address these limitations, we first incorporated Control Flow Graphs (CFGs) as supplementary prompts, achieving a 14.4% success rate in fixing previously unresolvable vulnerabilities. Through analysis of repair failures, we identified three primary challenge categories and developed corresponding prompt patterns incorporating techniques such as granular contextual information provision and progressive code simplification. Evaluation on real-world projects demonstrated that our approach significantly improved LLMs' repair capabilities, achieving over 85% success rates across all identified challenge categories. Our findings suggest that while LLMs have inherent limitations in handling complex vulnerabilities independently, they can become effective tools for automated vulnerability repair when guided by carefully crafted prompts.
Hide full abstract
Keywords
software vulnerabilities; Large Language Models; C++ Programming; prompt engineering
Control Families
None selected