The cybersecurity risks, benefits, and capabilities of AI systems are crucial for the security and AI policy. As AI becomes increasingly integrated into various aspects of our lives, the potential for malicious exploitation of these systems becomes a significant threat. Generative AI models and products are particularly susceptible to attacks due to their complex nature and reliance on large amounts of data. Developers require a comprehensive assessment of cybersecurity risks that ensure the safety and reliability of AI systems, protect sensitive data, prevent system failures, and maintain public trust.
Meta AI introduces CYBERSECEVAL 3 to address the cybersecurity risks, benefits, and capabilities of AI systems, specifically focusing on large language models (LLMs) like the Llama 3 models. Previous benchmarks, CYBERSECEVAL 1 and 2, have assessed various risks associated with LLMs, including exploit generation and insecure code outputs. These benchmarks highlighted the models’ susceptibility to prompt injection attacks and their propensity to assist in cyber-attacks. Based on CYBERSECEVAL 1 and 2, Meta AI’s CYBERSECEVAL 3 extends the evaluation to new areas of offensive security capabilities. The tool measures the abilities of Llama 3 405b, Llama 3 70b, and Llama 3 8b models in automated social engineering, scaling manual offensive cyber operations, and autonomous cyber operations.
To evaluate the offensive cybersecurity capabilities of Llama 3 models, the researchers conducted a series of empirical tests, including:
1. Automated Social Engineering via Spear-Phishing: Researchers simulated spear-phishing attacks using the Llama 3 405b model, comparing its performance to other models like GPT-4 Turbo and Qwen 2-72b-instruct. The assessment involved generating detailed victim profiles and evaluating the persuasiveness of the LLMs in phishing dialogues. Results showed that while Llama 3 405b could automate moderately persuasive spear-phishing attacks, it was not more effective than existing models, and risks could be mitigated by implementing guardrails like Llama Guard 3.
2. Scaling Manual Offensive Cyber Operations: The researchers assessed how well Llama 3 405b could assist cyberattackers in a “capture the flag” simulation. Participants included both experts and novices. The study found no statistically significant improvement in success rates or speed of completing cyberattack phases with the LLM compared to traditional methods like search engines.
3. Autonomous Offensive Cyber Operations: The team tested the Llama 3 70b and 405b models’ abilities to function autonomously as hacking agents in a controlled environment. The models performed basic network reconnaissance but failed in more advanced tasks like exploitation and post-exploitation actions. This indicated limited capabilities in autonomous cyber operations.
4. Autonomous Software Vulnerability Discovery and Exploitation: The potential of LLMs to identify and exploit software vulnerabilities was assessed. The finding suggests that Llama 3 models did not outperform traditional tools and manual techniques in real-world scenarios. The CYBERSECEVAL 3 benchmark was based on zero-shot prompting, but Google Naptime demonstrated that results can be further improved through tool augmentation and agentic scaffolding.
In conclusion, Meta AI effectively outlines the challenges of assessing LLM cybersecurity capabilities and introduces CYBERSECEVAL 3 to address these challenges. By providing detailed evaluations and publicizing their tools, the researchers offer a practical approach to understanding and mitigating the risks posed by advanced AI systems. The proposed methods show that while current LLMs, like Llama 3, exhibit promising capabilities, their risks can be managed through well-designed guardrails.
Check out the Paper and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..
Don’t Forget to join our 47k+ ML SubReddit
Find Upcoming AI Webinars here
Pragati Jhunjhunwala is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Kharagpur. She is a tech enthusiast and has a keen interest in the scope of software and data science applications. She is always reading about the developments in different field of AI and ML.