Large language models (LLMs) are expanding in usage, posing new cybersecurity risks. These risks emerge from their core traits: heightened capability in code generation, heightened deployment for real-time code generation, automated execution within code interpreters, and integration into applications handling untrusted data. This poses the need for a robust mechanism for cybersecurity evaluations.
Prior works to evaluate LLMs’ security properties include open benchmark frameworks and position papers proposing evaluation criteria. CyberMetric, SecQA, and WMDP-Cyber employ a multiple-choice format similar to educational evaluations. CyberBench extends evaluation to various tasks within the cybersecurity domain, while LLM4Vuln concentrates on vulnerability discovery, coupling LLMs with external knowledge. Rainbow Teaming, an application of CYBERSECEVAL 1, automatically generates adversarial prompts similar to those used in cyberattack tests.
Meta researchers present CYBERSECEVAL 2, a benchmark for assessing LLMs security risks and capabilities, including prompt injection and code interpreter abuse testing. The benchmark’s open-source code facilitates the evaluation of other LLMs. Also, the paper introduces the safety-utility tradeoff, quantified by the False Refusal Rate (FRR), highlighting LLMs’ tendency to reject both unsafe and benign prompts, impacting utility. A robust test set evaluates FRR for cyberattack helpfulness risk, revealing LLMs’ ability to handle borderline requests while rejecting the most unsafe ones.
CyberSecEval 2 categorizes prompt injection assessment tests into logic-violating and security-violating types, covering a broad range of injection strategies. Vulnerability exploitation tests focus on challenging yet solvable scenarios, avoiding LLM memorization and targeting LLMs’ general reasoning abilities. In code interpreter abuse evaluation, LLM conditioning is prioritized alongside unique abuse categories, while a judge LLM assesses generated code compliance. This approach ensures comprehensive evaluation of LLM security across prompt injection, vulnerability exploitation, and interpreter abuse, promoting robustness in LLM development and risk assessment.
In CyberSecEval 2, tests revealed a decline in LLM compliance with cyberattack assistance requests, dropping from 52% to 28%, indicating growing awareness of security concerns. Non-code-specialized models, like Llama 3, showed better non-compliance rates, while CodeLlama-70b-Instruct approached state-of-the-art performance. FRR assessments unveiled variations, with ‘codeLlama-70B’ exhibiting a notably high FRR. Prompt injection tests demonstrated LLM vulnerability, with all models succumbing to injection attempts at rates above 17.1%. Code exploitation and interpreter abuse tests underscored LLMs’ limitations, highlighting the need for enhanced security measures.
The Key contributions of this research are the following:
- Researchers added robust prompt injection tests, evaluating 15 attack categories on LLMs.
- They introduced evaluations measuring LLM compliance with instructions aiming to compromise attached code interpreters.
- Included the assessment suite measuring LLM capabilities in creating exploits in C, Python, and Javascript, covering logic vulnerabilities, memory exploits, and SQL injections.
- Introduced a new dataset evaluating LLM FRR when prompted with cybersecurity tasks, showing helpfulness versus harmfulness tradeoff.
To conclude, this research introduces CYBERSECEVAL 2, a comprehensive benchmark suite for assessing LLM cybersecurity risks. Prompt injection vulnerabilities persist across all tested models (13% to 47% success), underscoring the need for enhanced guardrails. Measuring the False Refusal Rate effectively quantifies the safety-utility tradeoff, revealing LLMs’ ability to comply with benign requests while rejecting offensive ones. Quantitative results on exploit generation tasks indicate the need for further research before LLMs can autonomously exploit systems despite improved performance with increasing coding ability.
Check out the Paper and GitHub page. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.
If you like our work, you will love our newsletter..
Don’t Forget to join our 40k+ ML SubReddit