In the dynamic landscape of cybersecurity, curated knowledge plays a pivotal role in empowering security analysts to respond effectively to cyber threats. Cyber Threat Intelligence (CTI) reports offer valuable insights into adversary behavior, but their length, complexity, and inconsistent structure pose challenges for extracting actionable information. To address this, our research focuses on automating the extraction of attack techniques from CTI reports and mapping them to the MITRE ATT &CK framework. For this task, fine-tuning Large Language Models (LLMs) for downstream sequence classification shows promise due to their ability to comprehend complex natural language. However, fine-tuning LLMs requires vast amounts of annotated domain-specific data, which is costly and time-intensive, relying on the expertise of security professionals. To meet these challenges, we propose ALERT, a novel cybersecurity framework which leverages active learning strategies in conjunction with an LLM. This approach dynamically selects the most informative instances for annotation, thereby achieving comparable performance with a significantly smaller dataset. By prioritizing the annotation of samples that contribute the most to the model’s learning, our methodology optimizes the allocation of resources. As a result, our framework achieves comparable performance with a dataset that is 77% smaller, making it more efficient for extracting and mapping attack techniques from CTI reports to the ATT &CK framework.
In the dynamic landscape of cybersecurity, curated knowledge plays a pivotal role in empowering security analysts to respond effectively to cyber threats. Cyber Threat Intelligence (CTI) reports offer valuable insights into adversary behavior, but their length, complexity, and inconsistent structure...
See full abstract
In the dynamic landscape of cybersecurity, curated knowledge plays a pivotal role in empowering security analysts to respond effectively to cyber threats. Cyber Threat Intelligence (CTI) reports offer valuable insights into adversary behavior, but their length, complexity, and inconsistent structure pose challenges for extracting actionable information. To address this, our research focuses on automating the extraction of attack techniques from CTI reports and mapping them to the MITRE ATT &CK framework. For this task, fine-tuning Large Language Models (LLMs) for downstream sequence classification shows promise due to their ability to comprehend complex natural language. However, fine-tuning LLMs requires vast amounts of annotated domain-specific data, which is costly and time-intensive, relying on the expertise of security professionals. To meet these challenges, we propose ALERT, a novel cybersecurity framework which leverages active learning strategies in conjunction with an LLM. This approach dynamically selects the most informative instances for annotation, thereby achieving comparable performance with a significantly smaller dataset. By prioritizing the annotation of samples that contribute the most to the model’s learning, our methodology optimizes the allocation of resources. As a result, our framework achieves comparable performance with a dataset that is 77% smaller, making it more efficient for extracting and mapping attack techniques from CTI reports to the ATT &CK framework.
Hide full abstract