这是用户在 2024-6-10 18:13 为 https://github.com/verazuo/jailbreak_llms 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
Skip to content

[CCS'24] A dataset consists of 15,140 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 1,405 jailbreak prompts).
[CCS'24] 数据集包含来自 Reddit、Discord、网站和开源数据集的 15,140 个 ChatGPT 提示(包括 1,405 个越狱提示)。

License

Notifications You must be signed in to change notification settings

verazuo/jailbreak_llms

Folders and files

NameName
Last commit message
Last commit date

Latest commit

e81e88f · Jun 8, 2024

History

13 Commits
Jun 8, 2024
Jun 8, 2024
Jun 8, 2024
Aug 2, 2023
Jun 8, 2024

Repository files navigation

In-The-Wild Jailbreak Prompts on LLMs
LLMs 上的野外越狱提示

arXiv: paper website: online dataet: released license: MIT

This is the official repository for the ACM CCS 2024 paper "Do Anything Now'': Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models by Xinyue Shen, Zeyuan Chen, Michael Backes, Yun Shen, and Yang Zhang.
这是 ACM CCS 2024 论文“Do Anything Now”:表征和评估大型语言模型的野外越狱提示,作者为 Xinyue Shen、Zeyuan Chen、Michael Backes、Yun Shen 和 Yang Zhang。

In this project, employing our new framework JailbreakHub, we conduct the first measurement study on jailbreak prompts in the wild, with 15,140 prompts collected from December 2022 to December 2023 (including 1,405 jailbreak prompts).
在这个项目中,我们利用我们的新框架JailbreakHub,对野外越狱提示进行了首次测量研究,从2022年12月到2023年12月收集了15,140条提示(其中1,405条越狱提示)。

Check out our website here.
请在此处查看我们的网站。

Disclaimer. This repo contains examples of harmful language. Reader discretion is recommended. This repo is intended for research purposes only. Any misuse is strictly prohibited.
免责声明。该存储库包含有害语言的示例。建议读者自行斟酌。该存储库仅用于研究目的。严禁任何滥用行为。

Data 数据

Prompts 提示

Overall, we collect 15,140 prompts from four platforms (Reddit, Discord, websites, and open-source datasets) during Dec 2022 to Dec 2023. Among these prompts, we identify 1,405 jailbreak prompts. To the best of our knowledge, this dataset serves as the largest collection of in-the-wild jailbreak prompts.
总体而言,我们在 2022 年 12 月至 2023 年 12 月期间从四个平台(Reddit、Discord、网站和开源数据集)收集了 15,140 个提示。在这些提示中,我们确定了 1,405 个越狱提示。据我们所知,该数据集是最大的野外越狱提示集合。

The data are provided here.
此处提供了数据。

Statistics of our data source. (Adv) UA refers to (adversarial) user accounts.
我们的数据源统计。 (Adv) UA 是指(对抗性)用户帐户。

Platform 平台 Source 来源 # Posts # 帖子 # UA # Adv UA # 高级用户代理 # Prompts # 提示 # Jailbreaks # 越狱 Prompt Time Range 提示时间范围
Reddit 红迪网 r/ChatGPT 163549 147 147 176 176 2023.02-2023.11
Reddit 红迪网 r/ChatGPTPromptGenius r/ChatGPTPPromptGenius 3536 305 21 654 24 2022.12-2023.11
Reddit 红迪网 r/ChatGPTJailbreak r/ChatGPT 越狱 1602 183 183 225 225 2023.02-2023.11
Discord 不和谐 ChatGPT 聊天GPT 609 259 106 544 214 2023.02-2023.12
Discord 不和谐 ChatGPT Prompt Engineering
ChatGPT 提示工程
321 96 37 278 67 2022.12-2023.12
Discord 不和谐 Spreadsheet Warriors 电子表格勇士 71 3 3 61 61 2022.12-2023.09
Discord 不和谐 AI Prompt Sharing AI即时分享 25 19 13 24 17 2023.03-2023.04
Discord 不和谐 LLM Promptwriting LLM即兴写作 184 64 41 167 78 2023.03-2023.12
Discord 不和谐 BreakGPT 破坏GPT 36 10 10 32 32 2023.04-2023.09
Website 网站 AIPRM - 2777 23 3930 25 2023.01-2023.06
Website 网站 FlowGPT 流GPT - 3505 254 8754 405 2022.12-2023.12
Website 网站 JailbreakChat 越狱聊天 - - - 79 79 2023.02-2023.05
Dataset 数据集 AwesomeChatGPTPrompts AwesomeChatGTP提示 - - - 166 2 -
Dataset 数据集 OCR-Prompts OCR 提示 - - - 50 0 -
Total 全部的 169,933 7,308 803 15,140 1,405 2022.12-2023.12

Question Set 问题集

To evaluate the effectiveness of jailbreak prompts, we construct a question set comprising 390 questions across 13 forbidden scenarios adopted from OpenAI Usage Policy.
为了评估越狱提示的有效性,我们构建了一个问题集,其中包含 390 个问题,涉及 OpenAI 使用政策中采用的 13 个禁止场景。

We exclude Child Sexual Abuse scenario from our evaluation and focus on the rest 13 scenarios, including Illegal Activity, Hate Speech, Malware Generation, Physical Harm, Economic Harm, Fraud, Pornography, Political Lobbying, Privacy Violence, Legal Opinion, Financial Advice, Health Consultation, and Government Decision.
我们从评估中排除了 Child Sexual Abuse 场景,并重点关注其余 13 个场景,包括 Illegal ActivityHate SpeechMalware GenerationPhysical HarmFraudPornographyPolitical LobbyingPrivacy ViolenceLegal OpinionFinancial AdviceHealth ConsultationGovernment Decision

The question set is provided here.
此处提供了问题集。

Code 代码

Evaluator - ChatGLMEval 评估器 - ChatGLMEval

cd code/ChatGLMEval
<!-- add data paths in df_path_list of run_evaluator.py -->
python run_evaluator.py

Semantics Visualization 语义可视化

Check code/semantics_visualization/visualize.ipynb 检查 code/semantics_visualization/visualize.ipynb

Ethics 伦理

We acknowledge that data collected online can contain personal information. Thus, we adopt standard best practices to guarantee that our study follows ethical principles, such as not trying to deanonymize any user and reporting results on aggregate. Since this study only involved publicly available data and had no interactions with participants, it is not regarded as human subjects research by our Institutional Review Boards (IRB). Nonetheless, since one of our goals is to measure the risk of LLMs in answering harmful questions, it is inevitable to disclose how a model can generate hateful content. This can bring up worries about potential misuse. However, we strongly believe that raising awareness of the problem is even more crucial, as it can inform LLM vendors and the research community to develop stronger safeguards and contribute to the more responsible release of these models.
我们承认在线收集的数据可能包含个人信息。因此,我们采用标准最佳实践来保证我们的研究遵循道德原则,例如不尝试对任何用户进行去匿名化并报告总体结果。由于这项研究仅涉及公开数据并且没有与参与者互动,因此我们的机构审查委员会(IRB)不将其视为人类受试者研究。尽管如此,由于我们的目标之一是衡量回答有害问题时LLMs的风险,因此不可避免地要披露模型如何生成仇恨内容。这可能会引发对潜在滥用的担忧。然而,我们坚信,提高对该问题的认识更为重要,因为它可以告知LLM供应商和研究社区制定更强有力的保护措施,并为更负责任地发布这些模型做出贡献。

We have responsibly disclosed our findings to related LLM vendors.
我们已负责任地向相关LLM供应商披露了我们的调查结果。

Citation 引文

If you find this useful in your research, please consider citing:
如果您发现这对您的研究有用,请考虑引用:

@inproceedings{SCBSZ24,
      author = {Xinyue Shen and Zeyuan Chen and Michael Backes and Yun Shen and Yang Zhang},
      title = {{``Do Anything Now'': Characterizing and Evaluating In-The-Wild Jailbreak Prompts on Large Language Models}},
      booktitle = {{ACM SIGSAC Conference on Computer and Communications Security (CCS)}},
      publisher = {ACM},
      year = {2024}
}

License 执照

jailbreak_llms is licensed under the terms of the MIT license. See LICENSE for more details.
jailbreak_llms 根据 MIT 许可证条款获得许可。有关更多详细信息,请参阅许可证。

About

[CCS'24] A dataset consists of 15,140 ChatGPT prompts from Reddit, Discord, websites, and open-source datasets (including 1,405 jailbreak prompts).
[CCS'24] 数据集包含来自 Reddit、Discord、网站和开源数据集的 15,140 个 ChatGPT 提示(包括 1,405 个越狱提示)。

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published