这是用户在 2025-4-12 19:01 为 https://app.immersivetranslate.com/word/ 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?

gujarat Video Annotation Content
古吉拉特邦视频注释内容

Quality Control Guidelines
质量控制指南

l Quality inspection volume: 250 valid hours.
l250个有效小时

l Workflow
l工作流程

After receiving the data, the quality inspectors will perform quality inspections on each
收到数据定性检验员每个

segment of the data. The content of the quality inspection includes:
段。内容包括:

1. Determine if the roles are correct:
1. 确定角色是否正确

In each data segment, the first speaker's role should be O1, and this person's role
在每个数据第一个说话人的角色应该是O1,这个人的角色

remains unique and consistent within that piece of audiot. The same applies to subsequent segments. Valid speech segments should have speaker roles, while invalid speech segments should have the fixed role N.
在该AudioT 保持唯一和一致同样适用于后续区段。Valid语音具有说话人角色,无效的语音段应具有固定角色N。

2. Select the appropriate accent for each role:
2.为每个角色选择合适的口音

By default, the speaker is assigned a standard accent. If they speak in a different
默认情况下为说话人分配标准口音。 如果他们以不同的方式说话

dialect of Czech, the accent should be adjusted accordingly.
捷克方言,口相应调整

3. Verify the correctness and completeness of the transcribed text in the
3.验证

transcription box:
转录框:

A. If there are any text errors, they should be corrected to reflect the actual
一个。 如果有任何文本错误更正它们以反映实际

content.
内容。

B. Follow the principle ofwhat is heard and what is transcribed ’. If there are
湾。 遵循听到的转录的”的原则如果有

speaker pauses, stutters, or incomplete pronunciation, use " -" to represent them.
说话者停顿、口吃或不完整的发音请使用 -” 来表示它们。

C. If there is overlapping speech in a segment, right-click and select the "overlap"
C.如果某个段落中有重叠的语音右键单击并选择“重叠”

tag. Enclose the overlapping portion of the text between the opening and closing tags.
标记。文本的重叠部分括在开始标签结束标签之间

For example: "we go to [OVERLAP/]school today[/OVERLAP]". This sentence
例如,“我们今天[OVERLAP/]学校[/OVERLAP]”。 句话

indicates that when the segment's speaker says "school today," another person is speaking at the same time. However, the content of the second person's speech does not need to be transcribed because this segment belongs to the primary speaker's ID, not the second person.
表示SegmentS说话人Schooltoday” 时,另一个同时说话但是,第二人称的语音内容不需要转录因为这个片段属于第一人称的ID,而不是第二人称。

*Note: The "overlap" tags must appear in pairs, enclosing the overlapping content
*注意:“overlap”标签必须出现重叠的内容括起来

between them. They should not appear individually.
在他们之间。它们不应单独显示

D. If the speech content involves personal privacy, the segment should be marked
D.如果语音内容涉及个人隐私,标记 segment

as invalid, and the content in the text box should be deleted. Right-click and select the [PIL] tag as a replacement
视为无效,并且应删除文本框中的内容。右键单击并选择 [PIL] 标签作为替代
.

l Excluding the above brief content, below are common rules encountered during
l以上简要内容外,以下是Countred期间的常见规则

quality inspection:
质量检验:

For the entire eligible audio segment, it should be segmented and annotated on a
对于整个符合条件的音频片段,应在

sentence basis. The following criteria determine an individual sentence as disqualified and do not require segmentation:
句子基础以下标准单个句子确定为不合格,不需要分段:

A, If two people overlap while speaking in a single sentence and their voices are
A,如果两个ing重叠在一个句子,并且他们的声音

similar in volume, resulting in significant overlap, it should be marked as invalid speech. However, if the overlap is minimal (only one or two words) and the content of the main speaker can be understood from the transcription, it should be transcribed normally.
音量相近造成明显重叠,标记为AS无效speech。但是,如果重叠最少(只有一两个单词),并且转录可以理解主要说话人的内容则应正常转录

B. If there are parts of a sentence that are unclear and the content cannot be
湾。 如果句子中有部分不清楚内容不能

determined, the sentence is considered invalid.
确定,则认为该判决无效。

C. If a sentence has strong noise (environmental noise, device noise) that makes it
C.如果一个句子有很强的噪声(环境噪声,设备噪声),

difficult to hear the main speaker's content, the sentence is deemed invalid.
很难听到主讲人的内容,这句话被认为是invalid。

D. If a sentence contains dropped frames, it is considered invalid
D. 如果句子包含丢帧,则视为无效
.

E. If a sentence does not consist of normal human speech (such as machine
E. 如果一个句子不是正常的人类语言组成例如机器

customer service, synthesized speech, or TV/radio broadcast), it is considered invalid.
客服、合成语音或电视/电台广播),视为无效。

F. If a sentence includes non-native language parts, it is considered invalid
F.如果句子包含非母语部分,则视为无效
.

G. If a sentence involves sensitive information (political, religious, explicit content,
G.如果一个句子涉及敏感信息n(政治、宗教、露骨内容、

violence), it is considered invalid."
violence),被认为是无效的。

l Role ID Check
l角色ID检查

In the same segment, different speakers should be identified using unique identity IDs and their genders and accents should be marked.
在同一片段,应使用唯一的身份 ID识别不同的说话人,并应标记他们的性别

l Transcription Check
l转录检查

1. The inspector should check transcribed text based on the audio they hear. The
1. 检查员根据他们听到音频检查转录的文本

transcription must be an exact match to the spoken language and should not contain any extra, missing, or incorrect words. The general guidelines are as follows:
转录必须与口语完全一致并且不应包含任何额外、缺失或不正确的单词。一般准则如下:

2. Capitalization: If a word is typically capitalized, it should be transcribed accordingly
2. 大写:如果一个单词通常大写,则应相应地转录
,

following normal writing conventions. For example: "China," "Microsoft."
遵循正常的写作惯例。例如:“China,”“Microsoft”。

3. Numerals: When numbers appear in the text, they should not be transcribed as Arabic
3. 数字:数字出现在文本时,不应转录阿拉伯语

numerals directly. Instead, they should be transcribed using the written form of the respective language.
numerals直接。相反,它们应该使用相应语言书面形式进行转录

For example:
例如:

original text
原文

transcribed text
转录文本

I ’m 15 years old
今年15

I m fifteen years old
今年15

4. Words Spelling
4.单词拼写

Capital letters should be separated by spaces. For example:
C顶字母用 spaces 分隔例如

original text
原文

transcribed text
转录文本

five thirty pm
下午 5点 30 分

five thirty P M
下午 5点 30

FBI

F B I

NFC

N F C

5. Abbreviations:
5.缩写

When transcribing, it is important to avoid using word abbreviations and instead use the full pronunciation of the words.
转录时,重要的是要避免使用单词的缩写,而是使用单词的完整发音。

For example:
例如

original text
原文

transcribed text
转录文本

This is Dr.Smith
我是Smith 博士

This is doctor Smith
这是史密斯医生

6. Punctuation
6.标点符号

A. Punctuation should be used according to grammar rules.
A. 应根据语法规则使用标点符号。

The punctuation spoken by the speaker needs to be transcribed. For example, "@" should be transcribed as "at," ".com" should be transcribed as "dot com."
说话出的标点符号需要转录例如,“@” 转录at,””。com转录dotcom”。

B. During the transcription process, only the following punctuation marks are allowed: comma (,), hyphen (-) when appearing within a word, period (.), exclamation mark (!), single quotation mark ('), and question mark (?). No other punctuation marks should be added, and any added symbols must adhere to grammar rules. All punctuation marks should be entered in a normal English input state.
湾。在转录过程中,只允许使用以下标点符号逗号 (,)、连字符 (-) 出现在单词句点 (.) 内)、感叹号 (!)、单引号 (') (?)。不应添加其他标点符号,并且任何添加的符号都必须遵守语法规则。所有标点符号都应正常的英语输入状态输入

7. Interjections
7.感叹词

Interjections should be transcribed accurately based on pronunciation and semantics. Pure
根据发音语义准确转录叹词

laughter does not need to be annotated. However, if an interjection contributes to the meaning of the context, it must be marked. For example, in the sentence "Shall we have dinner together later ”, the interjection "Hmm) ” is a response to the preceding statement and carries semantic meaning. If an interjection does not contribute to the meaning of the context, it does not need to be marked, but marking it would not be considered incorrect. For example, unconscious humming does not require annotation.
笑声不需要注释但是如果感叹词有助于上下文含义则必须对其进行标记。例如,sentence“Shallwehave dinnertogetherlater”中,感叹词“Hmm)”是对preceding 语句的回应带有语义意义 如果感叹词上下文的含义没有贡献不需要标记为d,标记它不会被视为不正确。例如,无意识的哼唱不需要注释。

8. Others
8.其他

A. Profanity content should be transcribed normally, avoiding the use of letters as substitutes.
一个。 亵渎内容正常转录避免使用字母作为替代品。

Internet slang and common online terms should be transcribed according to their common usage
Internet 俚语和常见的在线术语应根据其常见用法进行转录
.

B. If there are repeated words or phrases in the speech, they should all be transcribed.
湾。如果speech 中有重复的单词或短语则应将它们全部转录。

C. If a word or phrase is unclear in terms of meaning but the pronunciation is certain, such as a common personal name, a homophone can be chosen as a replacement. However, it is important to ensure that the transcribed text matches the pronunciation correctly. When there is a clear contextual meaning, choose a word that both matches the pronunciation and conveys the intended meaning for annotation.
C.如果一个单词短语含义unclear发音确定例如一个共同的人名,则可以选择同音字作为替代。无论如何确保转录的文本与发音正确匹配很重要的。有明确的上下文含义时,选择一个发音匹配能传达注释预期含义ing 的单词

D. If a word is not fully spoken, add a hyphen (-) at the end and leave a space before the next word. For example: "I want to go to s- school." Note that the end of the sentence must be a complete word. If an incomplete word is at the end of the sentence, it should be discarded without capturing it.
D. 如果某个单词没有完全说出请在末尾添加连字符 (-),并在下一个单词一个空格例如:“我想s-school请注意句子结尾必须是一个完整的单词。如果incomplete位于句子末尾,则应丢弃该词而不捕获it。

l Special labels:
l特殊标签:

During the inspection process, if the following situations occur, corresponding special labels should be added. The labels must be valid, ensuring that paired labels are not missing, the capitalization is consistent, and parentheses are matched properly.
检查过程中,如果出现以下情况应添加相应的特殊标签标签必须为 valid,确保成对标签缺失,大小写一致,括号正确匹配

* Right-click at the desired location to insert the label, then select the corresponding label from the list."
*右键单击所需位置以插入标签,然后从列表中选择相应的标签

Valid
有效

Noise or
Noise

Labels
标签
:

Explanation
解释

Role
角色

Transcribed
转录

or

invalid
无效

not

ID

test
测试

No noise
噪音

none
没有

Transcribe according to the heard content following the guidelines.
按照指南根据听到的内容进行转录

O1

or

O2
O2...

I go out for dinner.
我出去晚饭。

[N]
[编号]

If a sentence contains noise, it should be annotated with "[N]" at the end of the sentence. However, there is no need to differentiate the type of noise.
如果句子包含噪音,则应在句子末尾注释 with“[N]”。但是,re无需区分噪声的类型

O1

or

O2
O2...

I go out for dinner.[N]
我出去晚饭。N]

[HM]
[嗯]

If a speaker's content is in the form of rap, it should be marked with "[HM]" at the end of the sentence.
如果说话人的contenét rap 的形式则应句子末尾 “[HM]”。

O1

or

O2
O2...

I'm beginning to feel like a Rap God, Rap God[HM]
开始觉得自己像个说唱之神,说唱之神[HM]

[OVERL AP/][/ OVERL AP]

If there is an overlap in
如果

speech and one party is
演讲,而一方

particularly clear, only
特别清晰,只有

transcribe the speech of the person who is clear. Mark the role of that speaker and use a
转录清楚的人的演讲标记该说话人的角色,并使用

label to indicate the affected text.
label来指示受影响的文本。

O1

or

O2
O2...

I go out
出去

[OVERLAP/]for
[OVERLAP/]代表

dinner[/OVERLAP].
晚餐[/OVERLAP]。

Invalid segmen ts
无效的segments

Invalid
无效

speech
演讲

segments with
区段

human

voice
声音

[IVS]

Only noise segments that
只有

exceed 0.5 seconds in
超过 0.5

duration will be marked. This includes cases of speech
duration 将被标记。这包括语音大小写

overlap with similar volume levels, frame drops in the
与类似的音量级别重叠时,丢弃

audio, audio clipping
音频, 音频剪辑
,

presence of echo in the

speech, non-standard speech styles such as singing or
语音,非标准语音风格,如唱歌

speaking with a strained
与 A StraiNed 交谈

voice, speech in a non-target language, and instances
语音、非目标语言语音和实例

where individual words are
其中单个单词

unclear or cannot be
不清楚或无法

transcribed due to noise interference.。
由于噪声干扰而转录

N

[IVS]

Invalid
无效

speech
演讲

segments with
区段

[OIVS]

Noise segments exceeding 0.5 seconds in duration can be marked. For example:
可以标记持续时间超过0.5 秒噪声段。例如:

N

[OIVS]

none
没有

human voice

TV background noise with human voices
带有人声的电视背景噪音

Program announcer's commentary or
节目播音员的解说或

advertisements
广告

Music with accompanying vocals
音乐伴唱

In these cases, it is important to mark these segments as noise to distinguish them
在这些情况下,重要的是将这些片段标记为噪声以区分它们

from regular speech
来自常规语音

Sensitive informati

on

[PIL]
[皮尔]

The speech contains private information of the recording person, including detailed
该发言包含推荐人的私人信息包括详细的

address, phone number, ID card number, bank card
地址、电话号码身份证号码、银行卡

number, social security
号码、社保

number, passport number, etc.
号码、护照号码等

N

[PIL]
[皮尔]

l Qualification Rate Requirements
l资格认证R要求

The qualification rate requirements after quality inspection are as follows:
质量检查后的合格要求如下:

Text qualification rate: 95% word accuracy Role ID: 90% sentence accuracy
文本合格95% 单词准确率 角色ID90% 句子准确率

Special labels: 90% sentence accuracy
特殊标签90% 的句子准确率