这是用户在 2025-6-8 20:58 为 https://docs.google.com/document/u/0/d/1RqhmA9_4cQRNIZkKjHyi1bpY60F3-eqmhqBTp080cns/mobilebasic?tab=... 保存的双语快照页面,由 沉浸式翻译 提供双语支持。了解如何保存?
Workshop 6 - in class handout
研讨会 6 - 课堂讲义

IPA410 Digital Methods in Practice
IPA410 数字方法实践

Workshop 6: Sentiment Analysis
工作坊六:情绪分析

Wednesday May 7 2025  2025年5月7日星期三

Thursday May 8 2025  2025年5月8日星期四

Dr. Isabella Magni  伊莎贝拉·马格尼博士

Today in class we will experiment with sentiment analysis. This handout includes all the necessary links and all the steps we will take to analyse simple text prompts. You can follow along in class.
今天我们将在课堂上进行情绪分析实验。这份讲义包含所有必要的链接以及分析简单文本提示所需的所有步骤。你可以在课堂上跟着做。

Sentiment analysis: some context and existing tools
情绪分析:一些背景和现有工具

There are many commercial sentiment analysis tools that companies use to understand reviews and ratings of their products, and perform market research, among other things. Here are some of them:
市面上有许多商业情绪分析工具,公司可以使用它们来了解产品的评论和评分,并进行市场调研等。以下是其中一些:

These are all commercial tools, mostly behind a paywall, that are specifically designed for corporations to perform data analysis (including sentiment analysis) to evaluate their own brand and market space.
这些都是商业工具,大部分需要付费,专门为企业进行数据分析(包括情绪分析)以评估自己的品牌和市场空间而设计。

During our workshop today, we will not be using any of these tools, but we will be working on a Google Collab that we wrote ourselves for our specific needs. These are very simple Collab notebooks, that will take us step by step experimenting with simple sentiment analysis tasks, and with different approaches to performing sentiment analysis. Specifically we will be:
在今天的研讨会上,我们不会使用任何这些工具,而是会使用我们根据特定需求自行编写的 Google Collab。这些 Collab 笔记本非常简单,可以引导我们逐步尝试简单的情绪分析任务,以及不同的情绪分析方法。具体来说,我们将:

  • Using a pre-trained model
    使用预先训练的模型
  • Using a zero-shot classification model (pretrained)
    使用零样本分类模型(预训练)
  • Training a model to perform sentiment analysis
    训练模型进行情绪分析

What is Google Collab? And why do we use it?
什么是 Google Collab?我们为什么要使用它?

A Google Collab notebook is a list of cells. Cells contain either explanatory text or executable code and its output. You can click a cell to select it.
Google Collab 笔记本是一个单元格列表 。单元格包含说明性文本或可执行代码及其输出。您可以点击单元格来选择它。

Text cells  文本单元格

This below is an example of a text cell. Text cells usually contain explanatory text and necessary context regarding your code.
下面是一个文本单元格的示例。文本单元格通常包含与代码相关的解释性文本和必要的上下文。

You can double-click to edit these cells. Text cells use markdown syntax.
您可以双击来编辑这些单元格。文本单元格使用 markdown 语法。

Code cells  代码单元

Below is an example of a code cell. Once the toolbar button indicates CONNECTED, click in the cell to select it and execute the contents in the following ways:
下面是一个代码单元的示例 。工具栏按钮显示“已连接”后,单击该单元将其选中,然后按以下方式执行其中的内容:

  • Click the Play icon in the left gutter of the cell;
    单击单元格左侧边缘的播放图标;

There are additional options for running some or all cells in the Runtime menu.
运行时菜单中有用于运行部分或全部单元的附加选项

Using a pretrained model  使用预训练模型

This is a Google Collab which will allow you to first experiment with a pre-trained sentiment analysis model (cells 1-5), then a pre-trained zero-shot classification model (cells 6-12). Let’s walk through and run all the cells, one by one. You can Save a copy of this Collab in your Google Drive and follow along running the cells.
这是 Google 合作项目   这将允许您首先尝试预先训练的情绪分析模型(单元格 1-5),然后尝试预先训练的零样本分类模型(单元格 6-12)。让我们逐一介绍并运行所有单元格。您可以将此协作的副本保存到您的 Google 云端硬盘中,然后按照步骤运行这些单元格。

What have we learnt from experimenting with these two different sentiment analysis models?
通过试验这两种不同的情绪分析模型,我们学到了什么?

  • sentiment analysis is just a type of classification;
    情绪分析只是一种分类;
  • zero-shot classification is possible and potentially useful;
    零样本分类是可能的并且可能有用;
  • both models we experimented with have their limitations;
    我们试验的两种模型都有其局限性;
  • we can get fairly reliable results with both models, on a broad scale, but we might miss out on more nuanced results.
    从大范围内看,我们可以用这两种模型获得相当可靠的结果,但我们可能会错过更细微的结果。

You can now experiment with your own texts and classes, after trying the examples we provided (here is how you can do that). Some more potential sentences you could try:
现在,您可以尝试我们提供的示例( 操作方法如下 ),并在您自己的文本和课程中尝试一下。您还可以尝试以下一些可能的句子:

  • Oh, great! Another play which involves witchcraft..
    哦,太棒了!又一部涉及巫术的戏剧……
  • Shakespeare is not the worst writer I have ever encountered.
    莎士比亚并不是我所遇到过的最差的作家。
  • Macbeth isn’t bad.   麦克白并不​​坏。
  • There’s no way I am reading all of Shakespeare’s plays.
    我不可能读完莎士比亚的所有戏剧。
  • I’m not sure I should watch Macbeth the movie, after reading the play.
    看完戏剧后,我不确定是否应该看《麦克白》电影。
  • After learning about Shakespeare, I’m on cloud nine!
    了解了莎士比亚之后,我欣喜若狂!
  • Macbeth is the bee’s knees.
    麦克白是最出色的。
  • I read literary reviews about Macbeth. They were neither that funny, nor too witty.
    我读过一些关于《麦克白》的文学评论,既不怎么好笑,也不太机智。
  • I felt really glad that the movie Macbeth ended.
    我真的很高兴电影《麦克白》结束了。
  • I read all of Shakespeare’s plays. They are not bad.
    我读了莎士比亚的所有戏剧。它们还不错。
  • Macbeth was only kind of good.
    麦克白只是有点善良而已。
  • The plot was good, but the characters are uncompelling and the dialog is not great.
    故事情节不错,但人物缺乏吸引力,对话也不太精彩。
  • Reading Macbeth has never been so entertaining.
    阅读《麦克白》从未如此有趣。
  • I watched Macbeth the movie. I won't say that the movie is astounding and I wouldn't claim that the movie is too banal either.
    我看了电影《麦克白》。 我不会说这部电影有多惊艳,也不会说它有多平庸。
  • Macbeth was too good.  麦克白太好了。
  • I watched Macbeth the movie. There are slow and repetitive parts, but it has just enough spice to keep it interesting.
    我看了电影版的《麦克白》。虽然节奏缓慢,重复性很强,但剧情足够精彩,引人入胜。

Training a model  训练模型

This Google Collab tries to actually train a sentiment analysis model based on distilBERT using IMDB reviews. This is a smaller scale version of what Flair did to create the sentiment analysis model we used on the previous Collab.  Let’s walk through and run all the cells, one by one. You can Save a copy of this Collab in your Google Drive and follow along running the cells.
本次 Google Collab 尝试使用 IMDB 评论训练一个基于 distilBERT 的情绪分析模型。这是 Flair 在上一次 Collab 中使用的情绪分析模型的缩小版。让我们逐一演示并运行所有单元。您可以将此 Collab 的副本保存到您的 Google Drive 中,然后按照步骤运行单元。

What have we learnt from experimenting with this training model?
我们从这种训练模式的实验中学到了什么?

  • the model started out not working very well;
    该模型一开始运行得并不是很好;
  • it needed at least 50,000 sample reviews as training data to output better results;
    需要至少5万条样本评论作为训练数据才能输出较好的结果;
  • the training process is complex and slow.
    训练过程复杂且缓慢。

You can now experiment with your own texts, after trying the examples we provided. Some more potential sentences you could try:
尝试我们提供的示例后,你现在可以用自己的文本进行实验了。以下是一些你可以尝试的句子:

  • The movie was too good.
    这部电影太棒了。
  • There are slow and repetitive parts in this movie, but it has just enough spice to keep it interesting.
    这部电影中有一些节奏缓慢且重复的部分,但它有足够的趣味性来保持它的趣味性。
  • The script is not fantastic, but the acting is decent and the cinematography is excellent!
    剧本不是太出色,但表演还不错,摄影也很出色!
  • This movie is one of the most compelling variations on this theme.
    这部电影是这个主题最引人注目的变奏之一。
  • This movie is one of the least compelling variations on this theme.
    这部电影是这个主题中最不引人注目的变奏之一。
  • This movie is at least compelling as a variation on the theme.
    这部电影至少作为主题的变奏是引人注目的。
  • I didn’t hate watching this movie, but it was a bit slow.
    我并不讨厌看这部电影,但是节奏有点慢。
  • Wonderful! Yet another action movie..
    太棒了!又是一部动作片……
  • This is a unique movie, but I’m not sure I’d recommend it.
    这是一部独特的电影,但我不确定我是否会推荐它。
  • I don’t like this movie at all.
    我一点也不喜欢这部电影。

Final reflections  最后的思考

What have you learned about sentiment analysis? And about the different models we used?
你对情绪分析有什么了解?以及我们使用的不同模型有哪些?


Experiment with your own texts
用你自己的文本进行实验

  1. Save a copy of the Google Collab notebook in your Drive, and use this copy to follow the next steps.
    在您的云端硬盘中保存一份 Google Collab 笔记本的副本,并使用此副本执行后续步骤。
  2. Run all the cells for example 1 (cells 1-5). To run a cell, click on the Play button  to the left of each cell, and wait until it is done running the command.
    运行示例 1 的所有单元格( 单元格 1-5 )。要运行某个单元格,请点击每个单元格左侧的 “播放”按钮 ,然后等待其运行命令完成。

Running cell 1  运行单元 1

Running cell 2  运行单元 2

Running cell 3  运行单元 3

Running cell 4  运行单元 4

Running cell 5  运行单元 5

  1. Now change the input text in cell 5 and run the cell again. After you have done so, go back to the previous cell (cell 4) and re-run the cell. This will give you the new analysis results, based on the new sentence you provided.
    现在,更改单元格 5 中的输入文本并再次运行该单元格。完成后,返回上一个单元格(单元格 4)并重新运行该单元格。这将根据您提供的新句子提供新的分析结果。

Insert new text in cell 5, and then run the cell again
在单元格 5 中插入新文本,然后再次运行该单元格

Re-run cell 4 and wait for the new results
重新运行单元格 4 并等待新的结果

You can do a similar process when experimenting with the second exercise, the zero-shot classification model (cells 6-12). First run all the cells once, then add in your own text in cell 11 (the second to last) and run the cell. Then go back to cell number 9 (the one that starts with the comment # Here we can see the results) and run that cell again to see the new results.
在进行第二个练习,即零样本分类模型(单元格 6-12)时,你可以执行类似的过程。首先运行所有单元格一次,然后在单元格 11(倒数第二个)中添加你自己的文本并运行该单元格。然后返回单元格 9(以注释 # Here we can see the results 开头的单元格 ),再次运行该单元格以查看新的结果。

You can also experiment with changing both the input text and the parameters. In that case you can change the text and the parameters in cell 12 (the last one), and run the cell. Then go back to cell number 9 (the one that starts with the comment # Here we can see the results) and run that cell again to see the new results.
您还可以尝试同时更改输入文本和参数。在这种情况下,您可以更改单元格 12(最后一个)中的文本和参数,然后运行该单元格。然后返回单元格 9(以注释 # Here we can see the results 开头的单元格 ),再次运行该单元格以查看新的结果。