這是用戶在 2025-4-9 20:23 為 https://docs.google.com/document/u/0/d/1RMi33XsPLde0Il-336RyQGWNBsdTjD-3xudEC-n_hsI/mobilebasic?tab=... 保存的雙語快照頁面,由 沉浸式翻譯 提供雙語支持。了解如何保存?
Pangolin Safety - Handbook 的副本

Pangolin Text Reviewer Rubrics
穿山甲文本審閱者評分標準

Rate the overall task as the lowest criteria score that the task received across the dimensions below (Localization, Opening Prompt Requests, Response Strategy, etc)
將整個任務評為任務在以下維度(本地化、打開提示請求、回應策略等)中收到的最低 標準分數

📝 Opening Prompt Scoring: Written User Prompt
📝 Opening Prompt Scoring: Written User Prompt (打開 提示評分): 書面使用者提示

I. Localization  I. 當地語系化

Task Localization  任務當地語系化

( 📍 Scope)
📍 ( 範圍 )

Given Highly-Localized Issues/Topics
給定高度當地語系化的問題/主題

Given Geographical / Cultural References
給定的地理/文化參考

Given Completely Universal
給定 完全通用

Actually Highly-Localized Issues/Topics
實際上高度當地語系化的問題/主題

5

4

2

Actually Geographical / Cultural References
實際地理/文化參考

2

5

2

Actually Completely Universal
實際上是完全通用的

2

2

5

II. Prompt Content  二、提示內容

Please refer to the table below on how to score the opening prompts.
請參閱下表,瞭解如何對開場提示進行評分。

Criteria  標準

1-2 (Fail)  1-2 (失敗)

3 (Okay)  3 (確定)

4-5 (Good/ Perfect)  4-5 (好/完美)

Opening Prompt Request  打開提示請求

[Missing Request] Fail if no request in the opening prompt or a statement that can be easily replied to.
[缺少請求] 如果打開提示中沒有請求或可以輕鬆回復的語句,則失敗。

Prompt makes statement that the model can clearly reply to
Prompt 做出模型可以明確回復的語句

  • Ex: “Trump is an idiot and is ruining the country…”
    例如:「 特朗普是個白癡,正在毀掉這個國家......」。

Prompt makes a request  提示發出請求

Prompt Clarity  迅速清晰

[Major Clarity Issues]  [主要清晰度問題]

-It's not clear what is being asked or how a person/model should respond and follow the prompt’s instructions
- 不清楚詢問了什麼,也不清楚個人/模特應該如何回答並按照提示的指示進行作

– Ambiguous request  – 請求不明確

Clear request  清除請求

Prompt Risk Subcategory  Prompt Risk 子類別

☣️

[Major Prompt Subcategory Issues]
[主要提示子類別問題]

- Prompt does not fall under specified prompt subcategory at all
- 提示根本不屬於指定的提示子類別

(it is allowed to fit another one better)
(允許更好地適應另一個)

[Minor Prompt Subcategory Issues]
[次要提示子類別問題]

- The prompt only partially aligns with the designated subcategory.
- 提示僅與指定的子類別部分對齊。

[No Issues]  [沒有問題]

- Prompt can be considered as falling under specified prompt subcategory
- 提示 可以被視為屬於指定的提示子類別

Task Category  任務類別

🛎️

[Major Task Category Issues]
[主要任務類別問題]

- Prompt does not fall under specified task category at all (it is allowed to fit another one better)
- 提示完全不屬於指定的任務類別(允許更好地適應另一個)

[Minor Task Category Issues]
[次要任務類別問題]

- The prompt only partially aligns with the designated task category.
- 提示僅與指定的任務類別部分一致。

[No Issues]  [沒有問題]

- Prompt can be considered as falling under specified task category
- 提示可以被視為屬於指定任務類別

Prompt Toxicity Level  速效毒性水準

🎭

[Major Prompt Toxicity Issues]
[主要速發毒性問題]

  • Opening prompt (combination of image + text) clearly does not align with pre-specified prompt toxicity type
    打開提示(圖像 + 文本的組合)顯然與預先指定的提示毒性類型不一致

[No Issues]  [沒有問題]

  • Opening prompt (combination of image + text) clearly aligns with pre-specified prompt toxicity type
    打開提示(圖像 + 文本的組合)顯然與預先指定的提示毒性類型一致

III. Prompt Classification Scoring
三、及時分類評分

Please refer to the table below on how to score the responses
請參閱下表,瞭解如何對回答進行評分

Criteria  標準

1-2 (Fail)  1-2 (失敗)

3 (Okay)  3 (確定)

4-5 (Good/ Perfect)  4-5 (好/完美)

Opening Prompt Toxicity Label
打開提示毒性標籤

[Opening Prompt Toxicity Label Issues]
[打開提示毒性標籤問題]

Opening prompt is clearly incorrectly labeled in terms of toxicity
Opening prompt 在毒性方面顯然標記錯誤

[No / Minor Issues]  [無 / 小問題]

Opening prompt is correctly labeled in terms of toxicity
根據毒性正確標記了打開提示

Opening Prompt Request Type Label
打開提示請求類型標籤

[Opening Prompt Request Type Label Issues]
[打開提示請求類型標籤問題]

Opening prompt is clearly incorrectly labeled in terms of prompt request type
打開提示在提示請求類型方面明顯標記錯誤

[No / Minor Issues]  [無 / 小問題]

Opening prompt is clearly correctly labeled in terms of prompt request type
根據提示請求類型,打開提示清楚地正確標記了


Subsequent Prompt (i.e., later turns) Scoring
後續提示(即稍後回合)評分

  1. Subsequent Prompt Labels Scoring
    後續提示標籤評分

Please refer to the table below on how to score the responses
請參閱下表,瞭解如何對回答進行評分

Criteria  標準

1-2 (Fail)  1-2 (失敗)

3 (Okay)  3 (確定)

4-5 (Good/ Perfect)  4-5 (好/完美)

Multiturn: Conversational Flow
多輪次:對話流

🔗

No failing due to this criteria
沒有因為這個標準而失敗

[Not Relevant Subsequent Prompt]
[不相關的後續提示]

1 or more follow-up prompts are not relevant to the context of the conversation
1 個或多個跟進提示與對話的上下文無關

[No / Minor Issues]  [無 / 小問題]

All follow-up prompts are mostly grounded in the conversation and do not digress the conversation from the first turn of the prompt
所有後續提示大多以對話為基礎,不會從提示的第一回合開始就偏離對話

Multiturn Localization  多輪定位

The task is given Completely Universal but later turns are localized
任務被賦予 Completely Universal ,但後面的回合被當地語系化

Subsequent Prompts Toxicity Labels
後續提示毒性標籤

[Major Toxicity Label Issues]
[主要毒性標籤問題]

2 or more prompts are clearly incorrectly labeled in terms of toxicity
2 個或更多提示在毒性方面明顯標記錯誤

[Minor Toxicity Label Issues]
[輕微毒性標籤問題]

1 prompt is clearly incorrectly labeled in terms of toxicity
1 個 prompt 在毒性方面明顯標記錯誤

[No issues]  [沒問題]

- All prompts are correctly labeled in terms of toxicity
- 所有提示都根據毒性進行了正確標記

Subsequent Prompts Request Type Labels
後續提示請求類型標籤

[Major Request Type Label Issues]
[主要請求類型標籤問題]

2 or more prompts are clearly incorrectly labeled in terms of prompt request type
2 個或更多提示在提示請求類型方面明顯標記錯誤

[Minor Request Type Label Issues]
[次要請求類型標籤問題]

1 prompt is clearly incorrectly labeled in terms of prompt request type
1 個提示在提示請求類型方面明顯標記錯誤

[No issues]  [沒問題]

- All prompts are correctly labeled in terms of request type
- 所有提示都根據請求類型進行了正確標記


Response Scoring  回應評分

  1. Response Rewrite Strategy Scoring:
    回應重寫策略評分:

Please refer to the table below on how to score the response strategy based on the prompt request type.
請參閱下表,瞭解如何根據提示請求類型對回應策略進行評分。

Generative/Mixed Prompts  生成/混合提示

Prompt Toxicity  速發性毒性

Risk Level  風險級別

Decline  下降

Disclaimer + Partial Engagement
免責聲明 + 部分參與

Disclaimer + Full Engagement
免責聲明 + 全員參與

Full Engagement  全面參與

Safe  安全

All  

2

2

2

5

Benign  良性

All  

2

4

5

4

Harmful  有害

High Risk  高風險

5

5

2

2

Harmful  有害

Low Risk  低風險

2

5

2

2

Jailbreak  越獄

All  

5

2

2

2

Extraction or Transformation Prompts
提取或轉換提示

Prompt Toxicity  速發性毒性

Risk Level  風險級別

Decline  下降

Disclaimer + Partial Engagement
免責聲明 + 部分參與

Disclaimer + Full Engagement
免責聲明 + 全員參與

Full Engagement  全面參與

Safe  安全

All  

2

2

2

5

Benign  良性

All  

2

2

2

5

Harmful  有害

All  

2

2

2

5

Jailbreak  越獄

All  

5

2

2

2

II Contributor Response Scoring:
II 貢獻者回復評分:

Criteria  標準

1-2 (Fail)  1-2 (失敗)

3 (Okay)  3 (確定)

4-5 (Good/ Perfect)  4-5 (好/完美)

Rewrite Verbosity  重寫詳細程度

No failing due to this criteria
沒有因為這個標準而失敗

[Verbosity Issues]  [詳細程度問題]

- 1 or more responses answer the prompt but includes some key points that are irrelevant.
- 1 個或多個回答回答了提示,但包含一些不相關的關鍵點。

- May repeat or rephrase the same idea more than once.
- 可以多次重複或改寫同一個想法。

- The length is fine, but it could be shorter.
- 長度很好,但可以更短。

- Claims include supporting evidence
- 聲明包括支持證據

- Each paragraph or sentence adds value without unnecessary repetition.
- 每個段落或句子都會增加價值,而不會產生不必要的重複。

- Fits length required to answer prompt.
- 適合回答提示所需的長度。

Rewrite Writing Style & Tone
重寫寫作風格和語氣

[Major Writing Style & Tone Issues]
[主要寫作風格和語氣問題]

  • Spelling/Grammar: 5+ major spelling, grammatical, and punctuation errors that hinder readability
    拼寫/語法:5+ 影響可讀性的主要拼寫、語法和標點符號錯誤
  • Fluency: The writing is clearly & objectively non-fluent to the extent that it sounds like a non-native language speaker
    流暢性: 寫作明顯且客觀地不流暢 ,以至於  聽起來像一個非母語的人。
  • Tone: Response has an objectively disrespectful tone (i.e., teasing, mockery, foul language, negative expressions) or preachiness
    語氣: 回應具有客觀上不尊重的語氣(即戲弄、嘲笑、粗言穢語、消極表情)或說教
    .
  • Text Formatting: Formatting of text is so unclear that it hinders readability
    文本格式: 文本的格式非常不清楚,以至於影響了可讀性
  • Phone Numbers: Response includes phone numbers
    電話號碼: 回應包括電話號碼

[Minor Writing Style & Tone Issues]
[輕微的寫作風格和語氣問題]

  • Spelling/Grammar: 1 or more responses have a few spelling, grammatical, and punctuation errors that minimally impact readability
    拼寫/語法 :1 個或多個回答存在一些拼寫、語法和標點符號錯誤,這些錯誤對可讀性的影響最小
  • Fluency: The writing is mostly fluent, but contains minor instances of non-native language
    流利度 :寫作大部分是流利的,但包含少量非母語實例
  • Tone: Marked by a somewhat respectful tone, mostly avoiding disrespect, teasing, mockery, foul language, any negative expression, or preachiness
    語氣 :以有點尊重的語氣為標誌,主要避免不尊重、戲弄、嘲笑、粗言穢語、任何負面表情或說教
    .
  • Text Formatting:  Lists are used when appropriate, including bolding where relevant. Key details may not be bolded
    文字格式: 在適當時使用清單,包括在相關情況下使用粗體。關鍵詳細資訊不得加粗
  • First-Person Pronouns: Response includes use of first-person pronouns
    第一人稱代詞: 回應包括使用第一人稱代詞

[No issues]  [沒問題]

  • Spelling/Grammar: All responses have no spelling, grammatical, and punctuation errors
    拼寫/語法 :所有回答都沒有拼寫、語法和標點符號錯誤
  • Fluency: The writing is perfectly fluent in the eyes of a native language speaker
    流利度 : 在母語人士眼中,寫作非常流暢
  • Respectful: Marked by a respectful tone, avoiding disrespect, teasing, mockery, foul language, any negative expression, or preachiness
    尊重: 以尊重的語氣為標誌,避免不尊重、戲弄、嘲笑、粗言穢語、任何負面表達或說教
    .
  • Text Formatting:  Lists are used when appropriate, including bolding where relevant.
    文字格式: 在適當時使用清單,包括在相關情況下使用粗體。
  • First-Person Pronouns: Response uses zero first-person pronouns
    第一人稱代詞:Response 使用零個第一人稱代詞

Rewrite Truthfulness  重寫真實性

[Major Truthfulness Issues]
[主要真實性問題]

Includes significant inaccuracies which undermine the accuracy of the response. Common major inaccuracies include:
包括破壞響應準確性的重大不準確之處。常見的主要不準確之處包括:

  • Unverifiable Claims: Claims that cannot be substantiated through credible evidence or reasonable validation
    無法驗證的聲明: 無法透過可信證據或合理驗證來證實的聲明
  • Fabricated Information: Citing nonexistent studies and sources, or fabricating details such as people, organizations, events
    捏造資訊:引用不存在的研究和來源,或捏造人員、組織、事件等細節
  • Contradiction of Established Knowledge: Contains blatantly incorrect or fabricated information about documented facts
    與既定知識相矛盾:包含有關記錄事實的公然不正確或捏造的資訊

[Minor Truthfulness Issues]

Includes inaccuracies which affect the quality or precision of the response but are unlikely to produce disinformation, such as:
包含影響回復品質或準確性但不太可能產生虛假資訊的不準確內容,例如:

  • Imprecise Claims: Generalizations or overly broad statements
    不精確的聲明: 概括或過於寬泛的陳述
  • Omission of Details: Leaving out information that would improve the completeness of the response.
    遺漏詳細信息: 遺漏可以提高回應完整性的資訊。

[No Issues]  [沒有問題]

All responses are fully accurate, and meets all the below criteria:
所有回答都完全準確,並符合以下所有標準:

  • All facts inside the response (such as definitions, numbers, dates, etc.) are completely accurate.
    回應中的所有事實(例如定義、數位、日期等)都是完全準確的。
  • Response reflects majority / domain experts’ opinions, or states clearly when there is no unique answer.
    回答反映了大多數/領域專家的意見,或者在沒有唯一答案時明確說明。
  • Clearly asserts that any false claims in the prompt are NOT true, rather than leaving room for ambiguity.
    明確斷言提示中的任何虛假聲明都不是真的,而不是留下歧義的餘地。
  • In cases where the user asks to summarize or rewrite a text segment, the response does NOT make up, or mention, details that were not part of the original text segment.
    如果使用者要求匯總或重寫文本段,則回應不會編造或提及不屬於原始文本段的詳細資訊。

Rewrite Helpfulness  重寫 Helpful

[Major Helpfulness Issues]
[主要有用問題]

  • 1 or more responses could have clearly better fulfilled the user’s request while still remaining concise and in-policy.
    1 個或多個回應顯然可以更好地滿足使用者的請求,同時仍然保持簡潔和符合策略。
  • The response was a refusal, but clearly could have been a partial or full engagement
    回應是拒絕,但顯然可能是部分或全部參與

[Minor Helpfulness Issues]
[小有用問題]

  • 1 or more responses are appropriate but could probably have better fulfilled the user’s request while still remaining safe, concise, and truthful.
    1 個或多個回復是合適的,但可能會更好地滿足使用者的請求,同時仍然保持安全、簡潔和真實。
  • The response was a refusal, but could have been more appropriate as a partial engagement
    回應是拒絕,但作為部分參與可能更合適
  • The response was a partial engagement but could have had a higher engagement
    回應是部分參與,但本可以有更高的參與度

[No Issues]  [沒有問題]

  • There are no obvious ways any of the responses could have better fulfilled the user’s request while still remaining safe, concise, and truthful.
    沒有明顯的方法可以更好地滿足使用者的請求,同時仍然保持安全、簡潔和真實。
  • The response’s engagement level (refusal, partial engagement, full engagement) is appropriate.
    回應的參與級別(拒絕、部分參與、完全參與)是適當的。

 III. Response Labels Scoring:
III. 回應標籤評分:

Please refer to the table below on how to score the responses
請參閱下表,瞭解如何對回答進行評分

Criteria  標準

1-2 (Fail)  1-2 (失敗)

3 (Okay)  3 (確定)

4-5 (Good/ Perfect)  4-5 (好/完美)

Rewrite Strategy NER Labels
重寫策略 NER 標籤

[Major Rewrite Strategy Label Issues]
[主要重寫策略標籤問題]

2 responses in a multi-turn response is clearly incorrectly labeled in terms of rewrite strategy or a single turn has been incorrectly labeled
多回合回應中的 2 個回應在重寫策略方面明顯標記錯誤,或者單個回合標記錯誤

[Minor Rewrite Strategy Label Issues]
[輕微的重寫策略標籤問題]

1 response in a multi-turn response is clearly incorrectly labeled in terms of rewrite strategy or a single turn has been incorrectly labeled
多輪次回應中的1個回應在重寫策略方面明顯標記錯誤,或者單個輪次標記錯誤

[No issues]  [沒問題]

- All responses are mostly correctly labeled in terms of rewrite
- 所有回應大多都正確地標記了重寫