In-The-Wild Jailbreak Prompts on LLMs
LLMs 上的野外越狱提示

This is the official repository for the ACM CCS 2024 paper "Do Anything Now'': Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models by Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, and Yang Zhang.
这是 ACM CCS 2024 论文“Do Anything Now”：表征和评估大型语言模型的野外越狱提示，作者为 Xinyue Shen、Zeyuan Chen、Michael Backes、Yun Shen 和 Yang Zhang。

In this project, employing our new framework JailbreakHub, we conduct the first measurement study on jailbreak prompts in the wild, with 15,140 prompts collected from December 2022 to December 2023 (including 1,405 jailbreak prompts).
在这个项目中，我们利用我们的新框架JailbreakHub，对野外越狱提示进行了首次测量研究，从2022年12月到2023年12月收集了15,140条提示（其中1,405条越狱提示）。

Check out our website here.
请在此处查看我们的网站。

Disclaimer. This repo contains examples of harmful language. Reader discretion is recommended. This repo is intended for research purposes only. Any misuse is strictly prohibited.
免责声明。该存储库包含有害语言的示例。建议读者自行斟酌。该存储库仅用于研究目的。严禁任何滥用行为。

Data 数据

Prompts 提示

Overall, we collect 15,140 prompts from four platforms (Reddit, Discord, websites, and open-source datasets) during Dec 2022 to Dec 2023. Among these prompts, we identify 1,405 jailbreak prompts. To the best of our knowledge, this dataset serves as the largest collection of in-the-wild jailbreak prompts.
总体而言，我们在 2022 年 12 月至 2023 年 12 月期间从四个平台（Reddit、Discord、网站和开源数据集）收集了 15,140 个提示。在这些提示中，我们确定了 1,405 个越狱提示。据我们所知，该数据集是最大的野外越狱提示集合。

The data are provided here.
此处提供了数据。

Statistics of our data source. (Adv) UA refers to (adversarial) user accounts.
我们的数据源统计。 (Adv) UA 是指（对抗性）用户帐户。

Platform 平台	Source 来源	# Posts # 帖子	# UA	# Adv UA # 高级用户代理	# Prompts # 提示	# Jailbreaks # 越狱	Prompt Time Range 提示时间范围
Reddit 红迪网	r/ChatGPT	163549	147	147	176	176	2023.02-2023.11
Reddit 红迪网	r/ChatGPTPromptGenius r/ChatGPTPPromptGenius	3536	305	21	654	24	2022.12-2023.11
Reddit 红迪网	r/ChatGPTJailbreak r/ChatGPT 越狱	1602	183	183	225	225	2023.02-2023.11
Discord 不和谐	ChatGPT 聊天GPT	609	259	106	544	214	2023.02-2023.12
Discord 不和谐	ChatGPT Prompt Engineering ChatGPT 提示工程	321	96	37	278	67	2022.12-2023.12
Discord 不和谐	Spreadsheet Warriors 电子表格勇士	71	3	3	61	61	2022.12-2023.09
Discord 不和谐	AI Prompt Sharing AI即时分享	25	19	13	24	17	2023.03-2023.04
Discord 不和谐	LLM Promptwriting LLM即兴写作	184	64	41	167	78	2023.03-2023.12
Discord 不和谐	BreakGPT 破坏GPT	36	10	10	32	32	2023.04-2023.09
Website 网站	AIPRM	-	2777	23	3930	25	2023.01-2023.06
Website 网站	FlowGPT 流GPT	-	3505	254	8754	405	2022.12-2023.12
Website 网站	JailbreakChat 越狱聊天	-	-	-	79	79	2023.02-2023.05
Dataset 数据集	AwesomeChatGPTPrompts AwesomeChatGTP提示	-	-	-	166	2	-
Dataset 数据集	OCR-Prompts OCR 提示	-	-	-	50	0	-
Total 全部的		169,933	7,308	803	15,140	1,405	2022.12-2023.12

Question Set 问题集

To evaluate the effectiveness of jailbreak prompts, we construct a question set comprising 390 questions across 13 forbidden scenarios adopted from OpenAI Usage Policy.
为了评估越狱提示的有效性，我们构建了一个问题集，其中包含 390 个问题，涉及 OpenAI 使用政策中采用的 13 个禁止场景。

We exclude Child Sexual Abuse scenario from our evaluation and focus on the rest 13 scenarios, including Illegal Activity, Hate Speech, Malware Generation, Physical Harm, Economic Harm, Fraud, Pornography, Political Lobbying, Privacy Violence, Legal Opinion, Financial Advice, Health Consultation, and Government Decision.
我们从评估中排除了 Child Sexual Abuse 场景，并重点关注其余 13 个场景，包括 Illegal Activity 、 Hate Speech 、 Malware Generation 、 Physical Harm 、 Fraud 、 Pornography 、 Political Lobbying 、 Privacy Violence 、 Legal Opinion 、 Financial Advice 、 Health Consultation 和 Government Decision 。

The question set is provided here.
此处提供了问题集。

Code 代码

Evaluator - ChatGLMEval 评估器 - ChatGLMEval

cd code/ChatGLMEval
<!-- add data paths in df_path_list of run_evaluator.py -->
python run_evaluator.py

Semantics Visualization 语义可视化

Check code/semantics_visualization/visualize.ipynb 检查 code/semantics_visualization/visualize.ipynb

Ethics 伦理

We acknowledge that data collected online can contain personal information. Thus, we adopt standard best practices to guarantee that our study follows ethical principles, such as not trying to deanonymize any user and reporting results on aggregate. Since this study only involved publicly available data and had no interactions with participants, it is not regarded as human subjects research by our Institutional Review Boards (IRB). Nonetheless, since one of our goals is to measure the risk of LLMs in answering harmful questions, it is inevitable to disclose how a model can generate hateful content. This can bring up worries about potential misuse. However, we strongly believe that raising awareness of the problem is even more crucial, as it can inform LLM vendors and the research community to develop stronger safeguards and contribute to the more responsible release of these models.
我们承认在线收集的数据可能包含个人信息。因此，我们采用标准最佳实践来保证我们的研究遵循道德原则，例如不尝试对任何用户进行去匿名化并报告总体结果。由于这项研究仅涉及公开数据并且没有与参与者互动，因此我们的机构审查委员会（IRB）不将其视为人类受试者研究。尽管如此，由于我们的目标之一是衡量回答有害问题时LLMs的风险，因此不可避免地要披露模型如何生成仇恨内容。这可能会引发对潜在滥用的担忧。然而，我们坚信，提高对该问题的认识更为重要，因为它可以告知LLM供应商和研究社区制定更强有力的保护措施，并为更负责任地发布这些模型做出贡献。

We have responsibly disclosed our findings to related LLM vendors.
我们已负责任地向相关LLM供应商披露了我们的调查结果。

Citation 引文

If you find this useful in your research, please consider citing:
如果您发现这对您的研究有用，请考虑引用：

@inproceedings{SCBSZ24,
      author = {Xinyue Shen and Zeyuan Chen and Michael Backes and Yun Shen and Yang Zhang},
      title = {{``Do Anything Now'': Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models}},
      booktitle = {{ACM SIGSAC Conference on Computer and Communications Security (CCS)}},
      publisher = {ACM},
      year = {2024}
}

License 执照

jailbreak_llms is licensed under the terms of the MIT license. See LICENSE for more details.
jailbreak_llms 根据 MIT 许可证条款获得许可。有关更多详细信息，请参阅许可证。

Name	Name	Last commit message	Last commit date
Latest commit verazuo Add semantics visualization code Jun 8, 2024 e81e88f · Jun 8, 2024 History 13 Commits
code	code	Add semantics visualization code	Jun 8, 2024
data	data	Update README.md	Jun 8, 2024
.gitignore	.gitignore	add gitignore	Jun 8, 2024
LICENSE	LICENSE	Initial commit	Aug 2, 2023
README.md	README.md	Add semantics visualization code	Jun 8, 2024

Provide feedback 提供反馈信息

Saved searches 已保存的搜索

Use saved searches to filter your results more quickly

code

code

data

data

.gitignore

.gitignore

LICENSE

LICENSE

README.md

README.md

Repository files navigation

In-The-Wild Jailbreak Prompts on LLMs
LLMs 上的野外越狱提示

Data 数据

Prompts 提示

Question Set 问题集

Code 代码

Evaluator - ChatGLMEval 评估器 - ChatGLMEval

Semantics Visualization 语义可视化

Ethics 伦理

Citation 引文

License 执照

About

Releases

Languages

License

verazuo/jailbreak_llms

Folders and files

Latest commit

History

Repository files navigation

In-The-Wild Jailbreak Prompts on LLMsLLMs 上的野外越狱提示

Data 数据

Prompts 提示

Question Set 问题集

Code 代码

Evaluator - ChatGLMEval 评估器 - ChatGLMEval

Semantics Visualization 语义可视化

Ethics 伦理

Citation 引文

License 执照

About

Topics

Resources

License

Stars

Watchers

Forks

Languages

In-The-Wild Jailbreak Prompts on LLMs
LLMs 上的野外越狱提示