-
PDF
- Split View 分屏视图
-
Views
-
Cite 引用
Cite
Xinyan Zhao, Yuan Sun, Wenlin Liu, Chau-Wai Wong, Tailoring generative AI chatbots for multiethnic communities in disaster preparedness communication: extending the CASA paradigm, Journal of Computer-Mediated Communication, Volume 30, Issue 1, January 2025, zmae022, https://doi.org/10.1093/jcmc/zmae022
- Share Icon Share 分享
Abstract 摘要
This study is among the first to develop different prototypes of generative artificial intelligence (GenAI) chatbots powered by GPT-4 to communicate hurricane preparedness information to diverse residents. Drawing from the Computers Are Social Actors paradigm and the literature on disaster vulnerability and cultural tailoring, we conducted a between-subjects experiment with 441 Black, Hispanic, and Caucasian residents of Florida. Our results suggest that GenAI chatbots varying in tone formality and cultural tailoring significantly influence perceptions of their friendliness and credibility, which, in turn, relate to hurricane preparedness outcomes. These results highlight the potential of using GenAI chatbots to improve diverse communities’ disaster preparedness.
本研究率先开发了多种基于 GPT-4 的生成式人工智能(GenAI)聊天机器人原型,用于向不同族裔居民传递飓风防范信息。依托"计算机是社会行动者"理论范式,结合灾害脆弱性与文化适配性研究文献,我们在佛罗里达州 441 名黑人、西班牙裔和白人居民中开展了组间对照实验。结果表明,通过调整语气正式程度和文化适配性的 GenAI 聊天机器人能显著影响用户对其友好度和可信度的感知,进而影响飓风防范行为效果。这些发现凸显了 GenAI 聊天机器人在提升多元化社区灾害准备能力方面的潜力。
开放科学框架奖项

本文为重现所报告的研究方法和分析所需的组成部分已公开可用。
Lay Summary 简明摘要
This study looks at how generative artificial intelligence (genAI) chatbots help individuals from different racial groups prepare for hurricanes in Florida. We create and test GPT-4 chatbots that use different tones and tailor their messages to fit people’s cultural backgrounds. Our results show that chatbots with an informal tone are seen as more friendly, whereas those with either a formal tone or one that fits people’s cultural backgrounds are seen as more credible. Credible chatbots encourage people to seek, share, and act on disaster information. Overall, GenAI chatbots can create human-like and culturally-sensitive disaster communication for diverse communities.
本研究探讨了生成式人工智能(genAI)聊天机器人如何帮助佛罗里达州不同种族群体为飓风做准备。我们创建并测试了采用不同语气、并根据用户文化背景定制信息的 GPT-4 聊天机器人。结果显示,非正式语气的聊天机器人被认为更友好,而正式语气或符合用户文化背景的机器人则被视为更可信。可信的聊天机器人能激励人们寻求、分享灾害信息并采取行动。总体而言,GenAI 聊天机器人可为多元社区创建拟人化且具有文化敏感性的灾害沟通方案。
Generative artificial intelligence (GenAI), an emerging form of AI that produces customized content through generative models, presents transformative potential in disaster preparedness, response, and recovery (Bari et al., 2023). The past decade has seen a rapid development of AI-powered technologies used for disaster management. In disaster preparedness, AI has enhanced disaster predictions and early warnings during earthquakes, hurricanes, and floods (Lin et al., 2021). In disaster rescue and relief, AI supports crowd-sourced applications such as “live maps” that harness collective efforts to improve disaster relief outcomes (Ghaffarian et al., 2019).
生成式人工智能(GenAI)作为一种新兴的 AI 形式,通过生成模型产出定制化内容,在灾害准备、响应与恢复领域展现出变革性潜力(Bari 等人,2023)。过去十年间,用于灾害管理的 AI 驱动技术发展迅猛。在灾害准备阶段,AI 提升了地震、飓风和洪水等灾害的预测与预警能力(Lin 等人,2021);在灾害救援中,AI 支持诸如"实时地图"这类众包应用,整合集体力量优化救灾成效(Ghaffarian 等人,2019)。
Among various AI applications, chatbots stand out as one of the most promising tools for disaster management agencies to communicate disaster preparedness information to the public. Still in its infancy, research has demonstrated the versatility of chatbots in improving customer service experience, healthcare outcomes, and organization–public relationships (e.g., Men et al., 2023). In disaster and risk communication, generative AI chatbots have the potential to transform one-way, generic communication into interactive and tailored information for different community members. Such cultural tailoring is crucial for serving culturally and linguistically diverse communities, which often become hard-to-reach populations due to language barriers and ingrained communication preferences (Howard et al., 2017; Liu & Zhao, 2023). These communities are also the focus of disaster vulnerability research, as they suffer from deteriorated disaster outcomes more than other communities (Jay et al., 2023).
在众多人工智能应用中,聊天机器人脱颖而出,成为灾害管理机构向公众传递防灾准备信息最具前景的工具之一。尽管该领域尚处起步阶段,研究已证实聊天机器人在提升客户服务体验、改善医疗健康结果及优化组织与公众关系方面的多功能性(如 Men 等人,2023)。在灾害与风险沟通中,生成式 AI 聊天机器人有望将单向、通用的信息传递转变为针对不同社区成员的互动式定制化信息。这种文化适配对于服务文化和语言多元的社区至关重要——这些社区常因语言障碍和根深蒂固的沟通偏好成为难以触达的群体(Howard 等人,2017;Liu & Zhao,2023)。它们也是灾害脆弱性研究的焦点,因为这些社区遭受的灾害影响往往比其他社区更为严重(Jay 等人,2023)。
Recent human-computer interaction (HCI) research has demonstrated that advanced machine-learning technologies have significantly improved chatbots’ ability to mimic human interactions (Goyal et al., 2018) and made communication more interactive and personalized (Hancock et al., 2020). However, distrust remains an issue due to perceived deficiencies of chatbots in empathy and contextual understanding (e.g., Zhou et al., 2023). In high-stakes contexts like disaster communication, the use of chatbots could pose significant challenges, especially for multiethnic communities. This is due to historical factors such as cultural insensitivity and systemic racism, which contribute to long-standing mistrust among marginalized communities toward the government (Best et al., 2021).
近期人机交互(HCI)研究表明,先进的机器学习技术显著提升了聊天机器人模拟人类互动的能力(Goyal 等,2018),并使交流更具互动性和个性化(Hancock 等,2020)。然而,由于聊天机器人在共情和情境理解方面的明显不足(如 Zhou 等,2023),不信任问题依然存在。在灾害通讯等高风险情境中,使用聊天机器人可能带来重大挑战,尤其对多民族社区而言。这源于文化迟钝和系统性种族主义等历史因素,导致边缘化群体对政府长期存在不信任(Best 等,2021)。
In disasters, individuals’ interaction with public agencies is primarily driven by their informational needs (Liu, 2022; Liu & Zhao, 2023). To help organizations meet the informational needs of diverse cultural groups, this study designs and tests GenAI chatbots as interactive, human-like information provision agents, grounded in the computers are social actors (CASA) paradigm (Nass & Moon, 2000), through the lens of conversational tone (Kelleher & Miller, 2006) and cultural tailoring (Kreuter & McClure, 2004). The strategic communication literature on conversational tone highlights informal communication style (e.g., Men et al., 2023), which can improve public understanding and acceptance of complex, technical disaster information. Cultural tailoring, integrating both surface and deeper cultural elements (Resnicow et al.,1999) such as language, names, values, and beliefs, can enhance the chatbot’s relatability and credibility, thereby fostering user engagement. By creating GPT-4 chatbots that vary in tone and cultural tailoring, our study is among the first scholarly attempts to investigate how diverse communities perceive and interact with GenAI chatbots, and how this new approach can improve their disaster preparedness outcomes.
在灾难中,个体与公共机构的互动主要受其信息需求驱动(Liu, 2022; Liu & Zhao, 2023)。为帮助组织满足不同文化群体的信息需求,本研究基于“计算机是社会行动者”(CASA)范式(Nass & Moon, 2000),从对话语气(Kelleher & Miller, 2006)和文化定制(Kreuter & McClure, 2004)视角,设计并测试了 GenAI 聊天机器人作为互动式、拟人化的信息提供代理。关于对话语气的战略传播文献强调非正式沟通风格(如 Men 等,2023),这能提升公众对复杂技术性灾难信息的理解与接受度。文化定制通过整合语言、姓名、价值观和信仰等表层及深层文化元素(Resnicow 等,1999),可增强聊天机器人的亲和力与可信度,从而促进用户参与。 通过创建语气各异且文化定制化的 GPT-4 聊天机器人,我们的研究成为首批学术尝试之一,旨在探究不同社群如何感知并与人造智能聊天机器人互动,以及这一新方法如何提升其灾害准备成效。
In a between-subjects experiment with 441 Black, Hispanic, and Caucasian Florida residents, participants interacted with chatbots powered by OpenAI’s GPT-4 API for hurricane preparedness, varying by tone formality and cultural tailoring, followed by a questionnaire. We then analyzed how the variations in chatbot communication influenced participants’ chatbot perceptions, and subsequently, their hurricane preparedness outcomes.
在一项涉及 441 名佛罗里达州黑人、西班牙裔和白人居民的组间实验中,参与者与基于 OpenAI GPT-4 API 开发的飓风准备聊天机器人互动,这些机器人在语气正式度和文化适配性上有所差异,随后参与者完成问卷调查。我们进而分析了聊天机器人沟通方式的差异如何影响参与者对机器人的感知,并最终作用于他们的飓风准备效果。
Theorizing GenAI chatbots in disaster communication: the CASA paradigm
灾害沟通中的人造智能聊天机器人理论构建:CASA 范式
Chatbots are software applications that engage users through natural language processing (NLP) capabilities (Shawar & Atwell, 2007). With recent advances in machine learning, chatbots have evolved significantly in mimicking human-like attributes (Goyal et al., 2018). Hancock et al. (2020) argued that AI-mediated communication could transform interpersonal communication by tailoring responses to individual user preferences and communication styles. However, distrust, or algorithm aversion (Dietvorst et al., 2015), stemming from a perceived lack of human-like qualities such as empathy or contextual understanding (Zhou et al., 2023), remains a challenge. Longoni et al. (2019) found that users tend to distrust AI health systems due to the perception that algorithms cannot account for unique individual differences the way a human might.
聊天机器人是通过自然语言处理(NLP)能力与用户互动的软件应用(Shawar & Atwell, 2007)。随着机器学习的最新进展,聊天机器人在模仿人类特性方面取得了显著进步(Goyal 等,2018)。Hancock 等(2020)提出,通过根据用户个人偏好和沟通风格定制回应,AI 介导的交流可能改变人际沟通方式。然而,由于感知到缺乏共情或情境理解等类人特质(Zhou 等,2023),不信任或算法厌恶(Dietvorst 等,2015)仍是挑战。Longoni 等(2019)发现,用户倾向于不信任 AI 健康系统,因为他们认为算法无法像人类那样考虑独特的个体差异。
To address these challenges, the CASA paradigm provides a foundational framework for understanding how humans interact with technology. According to the CASA paradigm, people often respond to computers and other media as if they were actual social beings, applying social rules and behaviors to these interactions on a subconscious level (Nass et al., 1994; Nass & Moon, 2000). Extending this theory to media, Lombard and Xu (2021) further argued that various cues from new technologies can evoke a sense of social presence, making these technologies appear more “alive” or “social.” Thus, fostering humanness may be critical to overcoming distrust and promoting user engagement, especially in high-stake contexts such as disaster communication. For instance, Go and Sundar (2019) found that using highly anthropomorphic visual cues, such as human-like profile pictures, can compensate for the negative effects of a machine-like conversation style.
为应对这些挑战,CASA 范式提供了一个基础性框架,用以理解人类如何与技术互动。根据 CASA 范式,人们常像对待真实社会实体般回应计算机及其他媒体,在潜意识层面将社交规则和行为应用于此类交互中(Nass 等人,1994;Nass & Moon,2000)。Lombard 和 Xu(2021)将该理论延伸至媒体领域,进一步指出新技术的多种线索能唤起社会临场感,使这些技术显得更具"生命力"或"社交性"。因此,培育拟人化特质可能对克服不信任、促进用户参与至关重要,尤其在灾害传播等高风险情境中。例如,Go 和 Sundar(2019)发现,采用高度拟人化的视觉线索(如类人化头像)可弥补机器式对话风格的负面影响。
Building upon the CASA paradigm in the context of disaster communication, this study explores two key anthropomorphic cues—conversational tone and cultural tailoring—to develop chatbots as proxies for information sources, specifically serving as AI agents to deliver disaster information. A conversational tone is defined as a “conversational human voice” and “an engaging and natural style of organizational communication” (Kelleher, 2009, p. 177). According to the CASA paradigm, by adopting a conversational tone, a crucial anthropomorphic attribute in interpersonal communication, chatbots can better mimic human communication to create a friendly and sociable presence and trigger social responses from users, making interactions feel more natural and engaging. Compared to traditional one-way broadcasting communication, a conversational tone within the organizational context can better facilitate open dialogue and bring about positive relational outcomes such as relational satisfaction (Kelleher & Miller, 2006). Indeed, Men et al. (2022) found that chatbots with a conversational tone can improve perceptions of organizational listening and transparency, leading to more positive organization–public relationships.
本研究基于灾难传播情境下的 CASA 范式,探讨了两种关键拟人化线索——对话式语气与文化适配——以开发作为信息源代理的聊天机器人,特别是充当传递灾难信息的人工智能体。对话式语气被定义为“对话式人声”和“一种引人入胜且自然的组织传播风格”(Kelleher, 2009, p.177)。根据 CASA 范式,通过采用人际传播中关键的拟人化属性——对话式语气,聊天机器人能更逼真地模拟人类交流,营造友好亲善的形象并激发用户的社会性回应,使互动过程更显自然流畅。相较于传统的单向广播式传播,组织语境下的对话式语气更能促进开放式对话,并带来诸如关系满意度等积极的关系结果(Kelleher & Miller, 2006)。事实上,Men 等人... (2022 年)研究发现,采用对话式语气的聊天机器人能提升公众对组织倾听与透明度的感知,从而促进更积极的组织-公众关系。
Cultural tailoring emerges as the second critical dimension, essential for meeting the diverse information needs of multiethnic communities. While prior studies have explored basic personalization techniques grounded in the CASA paradigm, such as using names (e.g., Chen et al., 2021) or demographic-based message tailoring (e.g., Whang et al., 2022), a deeper understanding of cultural differences in AI interactions remains lacking. Based on the CASA paradigm, when chatbots are culturally tailored to match the user’s racial identity, users may subconsciously respond as if they are interacting with an in-group member. This tailored experience could enhance user trust and their willingness to engage deeply with the chatbot. This effect might be more pronounced in disaster scenarios, where certain racial communities exhibit a heightened risk perception. Cultural and socioeconomic differences can affect user interactions with chatbots and disaster preparedness outcomes (Appleby-Arnold et al., 2018), highlighting the need for more research on culturally tailored chatbots and their effectiveness in disaster preparedness.
文化定制作为第二个关键维度,对于满足多民族社区多样化的信息需求至关重要。尽管先前研究已探索了基于 CASA 范式的基本个性化技术,如使用姓名(如 Chen 等人,2021 年)或基于人口统计学的信息定制(如 Whang 等人,2022 年),但对 AI 互动中文化差异的深入理解仍显不足。依据 CASA 范式,当聊天机器人经过文化定制以匹配用户的种族身份时,用户可能会下意识地作出反应,仿佛在与同群体成员交流。这种定制化体验可增强用户信任及其与聊天机器人深度互动的意愿。在灾害情境下,这种效应可能更为显著,因为某些种族社区表现出更高的风险感知。文化和社会经济差异会影响用户与聊天机器人的互动及防灾准备结果(Appleby-Arnold 等人,2018 年),这凸显了对文化定制聊天机器人及其在防灾准备中有效性进行更多研究的必要性。
Taken together, enhancing the human-like qualities of GenAI chatbots through conversational tone and culturally tailored communication may help improve disaster communication outcomes for multiethnic communities.
综上所述,通过对话语调和符合文化习惯的交流方式增强生成式人工智能聊天机器人的人性化特质,可能有助于改善多民族社区在灾害沟通中的效果。
Disaster preparedness outcomes
防灾准备成果
As disaster preparedness is central to effective disaster management, the current study focuses on three important indicators of disaster preparedness as identified in the literature (e.g., McLennan et al., 2020; McNeill et al., 2013): disaster-related information seeking intention, information sharing intention, and disaster preparedness efficacy.
鉴于灾害准备是有效灾害管理的核心,本研究聚焦于文献中确认的三个重要准备指标(如 McLennan 等人,2020;McNeill 等人,2013):灾害相关信息寻求意向、信息共享意向及灾害准备效能。
Disaster information seeking is an active communication action in problem-solving processes (Kim et al., 2010). Information seeking is defined as the intentional scanning of the environment to gather information on a particular topic (Zhao & Tsang, 2021). In disaster communication, information seeking is vital in sensemaking processes. When faced with crises, threats, or other uncertain situations, individuals are naturally prompted to seek information as a coping mechanism to mitigate tensions and anxiety. However, research also suggests that disasters may overwhelm individuals, leading to avoidance rather than active information seeking (Seeger et al., 2003). Promoting information-seeking behavior is therefore needed to engage communities in disaster preparedness. With the rise of misinformation during disasters (Hunt et al., 2020), promoting information seeking from official sources, such as disaster management agencies, is particularly urgent.
灾难信息寻求是问题解决过程中的一种主动沟通行为(Kim 等人,2010)。信息寻求被定义为有意识地扫描环境以收集特定主题的信息(Zhao 和 Tsang,2021)。在灾难传播中,信息寻求在意义构建过程中至关重要。当面临危机、威胁或其他不确定情况时,个体会自然而然地寻求信息作为一种应对机制,以缓解紧张和焦虑。然而,研究也表明,灾难可能会使个体不堪重负,导致他们选择回避而非主动寻求信息(Seeger 等人,2003)。因此,需要促进信息寻求行为,以动员社区参与防灾准备。随着灾难期间错误信息的增多(Hunt 等人,2020),推动从官方来源(如灾难管理机构)寻求信息变得尤为迫切。
Meanwhile, disaster information sharing intention refers to the willingness to transmit relevant information to a larger crowd to facilitate disaster coping or problem-solving. As Kim et al. (2010) suggest, a problem “is easier to solve when it comes to a problem of a group rather than that of an isolated individual” (p. 149). Active information sharing thus helps improve individuals’ and communities’ disaster preparedness.
与此同时,灾难信息共享意愿指的是将相关信息传递给更广泛人群以促进灾难应对或问题解决的意愿。正如 Kim 等人(2010 年)所指出的,“当问题属于群体而非孤立个体时,解决起来会更容易”(第 149 页)。因此,积极的信息共享有助于提升个人及社区对灾害的防范能力。
Finally, a sense of disaster preparedness captures the perceived confidence and efficacy in preparing for and coping with a disaster. Research has proposed multiple dimensions under this construct, including a cognitive dimension that taps into individuals’ risk awareness, disaster knowledge, and physical preparation intentions, as well as a psychological dimension that indicates individuals’ psychological readiness for disasters (Paton, 2003).
最后,防灾意识体现了人们在准备和应对灾害时的感知自信与效能。研究提出了这一构念下的多个维度,包括触及个人风险意识、灾害知识及物质准备意向的认知维度,以及反映个体心理准备状态的心理维度(Paton,2003)。
Having identified the primary outcomes of disaster preparedness, the following sections detail communication tone and cultural tailoring, two key mechanisms in the disaster vulnerability and HCI literature.
在确定了备灾的主要成果后,以下部分将详细阐述沟通语气和文化适应性调整,这两者是灾害脆弱性和人机交互文献中的关键机制。
The role of communication tone in Chatbot disaster communication
聊天机器人灾害沟通中交流语气的作用
The type of human-–computer interaction within disaster communication is unique. While both information and social support needs arise from disasters, individuals’ interaction with public agencies is more likely driven by one’s informational needs. This tendency is evidenced by individuals’ strategic construction of disaster communication ecologies, where certain information sources—such as media and public agencies—are predominantly used for obtaining disaster information, while interpersonal communication becomes the preferred source for social support (Liu, 2022; Liu & Zhao, 2023; Zhao & Liu, 2024).
灾害沟通中的人机交互类型具有独特性。尽管灾难会同时引发信息需求与社会支持需求,但个体与公共机构的互动更多由信息需求驱动。这一倾向体现在人们策略性构建的灾害沟通生态中——特定信息来源(如媒体和公共机构)主要被用于获取灾害信息,而人际交流则成为社会支持的首选渠道(Liu, 2022; Liu & Zhao, 2023; Zhao & Liu, 2024)。
Information delivery and provision thus become the primary function in the organizational use of technology for disaster communication. To that end, research has identified key mechanisms—from both the communicator and recipient’s sides—that can facilitate public acceptance of disaster risk information. From the delivery side, an informal communication style has been particularly noted in the literature. Due to the knowledge gap and the technical nature of disaster information, an accessible, translational communication style can improve public understanding of science and risk information (e.g., Mabon, 2020). This echoes the rich body of strategic communication research on conversational human voice (e.g., Men et al., 2023), characterized by informal communication style. This style features casual, colloquial language, in contrast to a formal style that uses authoritative and structured language (Gretry et al., 2017).
因此,在组织利用技术进行灾害沟通时,信息传递与提供成为核心功能。为此,研究已从传播者和接收者两个角度识别出促进公众接受灾害风险信息的关键机制。从传播端来看,非正式的沟通风格在文献中尤为突出。由于灾害信息的知识门槛高且技术性强,采用通俗易懂、转化式的沟通方式能提升公众对科学与风险信息的理解(如 Mabon, 2020)。这与战略传播研究中关于"对话式人际声调"(如 Men 等, 2023)的大量成果相呼应,其特点正是非正式沟通风格。这种风格使用随意、口语化的表达,与采用权威性结构化语言的正式风格形成鲜明对比(Gretry 等, 2017)。
From the message recipients’ side, we focus on two mediating factors of particular relevance to communicating disasters to multiethnic communities, friendliness and credibility perceptions. The decline of trust in public institutions has been particularly felt among marginalized, multiethnic communities (Best et al., 2021). Among racial minorities, community members are less likely to seek disaster information from public agencies than their White counterparts, likely due to factors such as systemic racism and hostile perceptions of government agencies (Best et al., 2021). The alienating experience of seeing public agencies as “others” rather than “one of us” can be greatly mitigated by the perception of friendliness. Just as intergroup contact theory confirms the critical effect of friendships on enhancing intergroup experience (e.g., Pettigrew, 1998), we similarly hypothesize that perceived friendliness can improve disaster communication outcomes.
从信息接收者的角度出发,我们聚焦于对多民族社区灾害传播尤为关键的两个中介因素——友好度与可信度感知。边缘化多民族社区对公共机构信任度的下降尤为显著(Best 等人,2021)。少数族裔群体成员相较于白人群体更少从公共机构获取灾害信息,这很可能源于系统性种族主义及对政府机构的敌对感知等因素(Best 等人,2021)。将公共机构视为"他者"而非"我们一员"的疏离体验,可通过提升友好度感知得到显著缓解。正如群际接触理论证实友谊对改善群际体验的核心作用(如 Pettigrew,1998),我们同样假设:感知友好度能优化灾害传播效果。
Meanwhile, as rumors and misinformation often emerge from a disaster, a strong credibility perception towards official sources would motivate the public to seek information from public agencies over other sources (Austin et al., 2012; Zhao & Tsang, 2021). Credibility perception, therefore, serves as an equally important mediating mechanism of interest.
与此同时,由于谣言和错误信息常伴随灾难而生,公众对官方信息来源的高度可信度感知会促使他们优先选择从公共机构而非其他渠道获取信息(Austin 等人,2012;Zhao 与 Tsang,2021)。因此,可信度感知作为一种同等重要的中介机制,值得重点关注。
Organizations can strategically use an informal communication style to reduce the social distance from the public and make the interaction personal to foster engagement and trusting relationships (Liu et al., 2020). HCI literature suggests this can be achieved through the human-like design of chatbots. Specifically, Chaves and Gerosa (2021) demonstrated that chatbots employing a more casual, friendly tone were perceived as more likable, leading to higher user satisfaction than those using a formal tone. Go and Sundar (2019) identify three factors that can increase the level of humanness of chatbots—anthropomorphic cues, human identity cues, and conversational cues—which, in turn, lead to more favorable attitudes and greater behavioral intentions to return to a given website.
组织可策略性采用非正式沟通风格来缩小与公众的社会距离,使互动更具个人化特质,从而促进参与度与信任关系的建立(Liu 等人,2020)。人机交互研究指出,这可通过聊天机器人的人性化设计实现。具体而言,Chaves 和 Gerosa(2021)的研究表明,采用随意友好语气的聊天机器人更受用户喜爱,其用户满意度显著高于使用正式语气的机器人。Go 与 Sundar(2019)进一步提出提升聊天机器人人性化程度的三大要素——拟人化线索、人类身份标识及对话式交互线索——这些要素能促使用户形成更积极的态度,并增强其回访特定网站的意愿。
However, existing studies have shown that an informal communication style can backfire when perceived as inappropriate in the organizational context (Gretry et al., 2017), particularly during disasters when the public expects the government to be authoritative and credible (Zhao & Zhan, 2019). Taken together, the formality of a GenAI chatbot’s tone is hypothesized to impact user perceptions of the chatbot, specifically by decreasing perceived bot friendliness but increasing perceived bot credibility.
然而,现有研究表明,非正式的沟通风格若在组织情境中被视为不当,可能适得其反(Gretry 等,2017),尤其是在公众期望政府展现权威性与可信度的灾难期间(Zhao & Zhan,2019)。综合来看,GenAI 聊天机器人语气的正式性被假设会影响用户对其的感知,具体表现为降低用户感知的机器人友好度,但提升其可信度。
H1: The tone formality of a GenAI chatbot (a) negatively affects diverse residents’ perceptions of the chatbot’s friendliness and (b) positively affects credibility.
H1:生成式 AI 聊天机器人的语气正式程度(a)对多元居民感知其友好性产生负面影响,(b)对其可信度产生正面影响。
According to Johnson and Grayson (2005), perceived friendliness is crucial for establishing trust in user–chatbot relationships. This perception can evoke benevolent attitudes toward chatbots (Baek et al., 2022). The perceived friendliness can enhance the sense of social presence or the perception of interacting with a real human (Konya-Baumbach et al., 2023), and thus is more likely to increase trust and credibility. For example, a study revealed that the perceived friendliness of a chatbot, achieved through social and informal language cues, signaled its credibility and increased users’ willingness to disclose personal information (Jin & Eastin, 2022). Based on the evidence, we propose the following hypothesis:
根据 Johnson 与 Grayson(2005)的研究,感知友好性对于建立用户与聊天机器人之间的信任至关重要。这种感知能激发对聊天机器人的善意态度(Baek 等,2022)。感知友好性可增强社会临场感,即与真人互动的感知(Konya-Baumbach 等,2023),从而更可能提升信任与可信度。例如,一项研究表明,通过社交和非正式语言线索实现的聊天机器人感知友好性,能够传递其可信度并增加用户披露个人信息的意愿(Jin & Eastin,2022)。基于上述证据,我们提出以下假设:
H2: The perceived friendliness of a GenAI chatbot is positively associated with its perceived credibility.
H2:生成式 AI 聊天机器人的感知友好性与其感知可信度呈正相关。
Further, we assess agencies’ use of GenAI chatbots to improve individuals’ disaster preparedness outcomes. Previous research has revealed that perceived friendliness positively affects users’ evaluation of chatbots (Wang et al., 2023). By eliciting benevolent feelings toward the bots, friendliness perception helps promote prosocial and other compliant behaviors (Baek et al., 2022). For example, Verhagen et al. (2014) found that perceived friendliness positively predicted user satisfaction. Recent research has also shown that chatbots can improve communities’ disaster-coping experience by promoting community members’ communicative activeness and collective efficacy (e.g., Tsai et al., 2019). To test whether the use of the GenAI chatbot can enhance disaster preparedness outcomes through perceived friendliness and credibility, we propose the following hypotheses:
此外,我们评估了机构利用生成式人工智能聊天机器人提升个人防灾准备成效的情况。既往研究表明,感知友好性正向影响用户对聊天机器人的评价(Wang 等,2023)。通过激发用户对机器人的善意感受,友好性感知有助于促进亲社会及其他顺从行为(Baek 等,2022)。例如,Verhagen 等(2014)发现感知友好性可正向预测用户满意度。最新研究还表明,聊天机器人能通过提升社区成员的沟通活跃度与集体效能感来改善社区灾害应对体验(如 Tsai 等,2019)。为验证生成式 AI 聊天机器人是否可通过感知友好性和可信度增强防灾准备效果,我们提出以下假设:
H3: The perceived friendliness of a GenAI chatbot positively relates to disaster preparedness outcomes, including (a) information-seeking intention, (b) sharing intention, and (c) preparedness.
H3:GenAI 聊天机器人感知到的友好性与灾害准备结果呈正相关,包括(a)信息寻求意愿、(b)分享意愿和(c)准备程度。H4: The perceived credibility of a GenAI chatbot positively relates to disaster preparedness outcomes, including (a) information-seeking intention, (b) sharing intention, and (c) preparedness.
H4:感知到的生成式 AI 聊天机器人可信度与灾害准备结果呈正相关,包括(a)信息寻求意愿,(b)分享意愿,以及(c)准备行为。
To test whether and how perceived friendliness and credibility perceptions may mediate the relationship between the GenAI chatbot’s tone and disaster preparedness outcomes, we propose the following research question:
为检验感知友好度与可信度是否及如何中介生成式 AI 聊天机器人语气与灾害准备结果之间的关系,我们提出以下研究问题:
RQ1: How does the tone formality of a GenAI chatbot influence diverse residents’ disaster preparedness outcomes (including information seeking, sharing, and preparedness) through chatbot perceptions (perceived friendliness and credibility)?
RQ1:生成式 AI 聊天机器人的语气正式性如何通过聊天机器人感知(感知友好度与可信度)影响不同居民的灾害准备结果(包括信息寻求、分享及准备行为)?
The role of culturally tailored GenAI chatbots for diverse communities
文化适配型生成式 AI 聊天机器人在多元化社区中的作用
As mentioned, it is crucial for GenAI chatbots to adapt to the cultural nuances of multiethnic communities, as ethnic minorities with cultural and socioeconomic differences can exhibit different patterns of interactions with chatbots, producing diverging disaster preparedness outcomes (Appleby-Arnold et al., 2018). Culture, broadly defined, can include race, ethnicity, gender, and sexual orientation. Given our focus on multiethnic communities, this study focuses on race/ethnicity as the major indicator of culture. Cultural tailoring, which adapts messages to the cultural characteristics of a specific group, helps communication interventions effectively address a given group’s decision-making needs and interests (Kreuter & McClure, 2004). As more disaster and risk communication research focuses on explaining and reducing disaster coping disparities among diverse communities, cultural tailoring has been proposed as a promising approach to bridge disparities (Huang & Shen, 2016). Ma and Zhao (2024) posited that cultural tailoring involves highlighting specific risks and coping implications for distinct cultural groups. For example, by differentiating between marginalized communities (e.g., the Hispanic or Black community) and the general population, disaster communication can be designed to meet the unique information needs of diverse groups and reduce disparities.
如前所述,生成式人工智能聊天机器人适应多民族社区的文化细微差别至关重要,因为具有文化和社会经济差异的少数民族群体在与聊天机器人互动时可能表现出不同模式,从而产生不同的防灾准备结果(Appleby-Arnold 等人,2018 年)。广义上的文化可涵盖种族、民族、性别及性取向。鉴于本研究聚焦于多民族社区,我们将种族/民族作为文化的主要指标。文化定制——即根据特定群体的文化特征调整信息——有助于沟通干预措施有效满足该群体的决策需求和兴趣(Kreuter & McClure,2004 年)。随着越来越多的灾害与风险传播研究致力于解释和减少不同社区间的灾害应对差异,文化定制被视为弥合差距的有效途径(Huang & Shen,2016 年)。马与赵(2024 年)提出,文化定制需要针对不同文化群体突出特定风险及应对策略。 例如,通过区分边缘化社区(如西班牙裔或黑人社区)与普通人群,可以设计灾害传播以满足不同群体的独特信息需求并减少差异。
The strategies of cultural tailoring can be categorized into surface tailoring and deep tailoring (Resnicow et al., 1999). Surface-level communication tailoring adapts to a culture’s “superficial” aspects, including language, appearance, and diet. By contrast, deep tailoring engages with a community’s social, historical, and psychological characteristics, such as values, traditions, and norms—considered as the culture’s deep structure (e.g., Kreuter et al., 2004). A meta-analysis on cultural tailoring in cancer communication shows that culturally tailored cancer communication had a small significant effect on persuasion, and deep tailoring had a more substantial impact than surface tailoring (Huang & Shen, 2016).
文化适应策略可分为表层适应与深层适应(Resnicow 等,1999)。表层沟通适应针对文化的"表面"特征进行调整,包括语言、外观和饮食;而深层适应则涉及社区的社会、历史及心理特性,如价值观、传统与规范——这些被视为文化的深层结构(例如 Kreuter 等,2004)。一项关于癌症沟通中文化适应的元分析显示,文化适应的癌症沟通对说服力具有微小但显著的影响,且深层适应的效果明显优于表层适应(Huang & Shen,2016)。
The concept of cultural tailoring has rarely been investigated in the literature of disaster communication and HCI. Contextualizing cultural tailoring in disaster communication and GenAI chatbots, this study examines both surface-level and deep-level tailoring for more inclusive chatbot-human interactions. Surface-level tailoring includes adapting the chatbot’s language, slang, and the use of culturally familiar names to resonate with specific cultural groups. Drawing on the CASA paradigm, this approach can be viewed as a form of anthropomorphism, where culturally relevant cues enhance the chatbot’s perceived humanness and relatability. In contrast, deep-level tailoring involves embedding deeper cultural elements—such as values, traditions, religious beliefs, historical contexts, and cultural narratives—into chatbot design. This deep-level approach remains underexplored within HCI, partially because existing theories like the cue effects of the HAII-TIME model (Sundar, 2020) primarily focus on surface-level cues and heuristics rather than examining how deeply embedded cultural values and beliefs can foster trust and engagement among diverse groups. By integrating both aspects of cultural tailoring into GenAI–human interactions, our study has the potential to enrich user engagement and disaster preparedness outcomes for multiethnic communities. Given the lack of research on culturally tailored GenAI chatbots in disaster preparedness, we propose the following research questions to explore their impact on diverse communities’ perceptions of the chatbots and subsequent disaster-related outcomes:
文化定制这一概念在灾害传播和人机交互(HCI)文献中鲜有研究。本研究将文化定制置于灾害传播与生成式人工智能聊天机器人(GenAI chatbots)的语境下,探讨了表层与深层定制如何促进更具包容性的聊天机器人与人互动。表层定制包括调整聊天机器人的语言、俚语及使用文化熟悉的名称,以引起特定文化群体的共鸣。借鉴 CASA 范式,这一方法可视为拟人化的一种形式,即通过文化相关线索增强聊天机器人被感知的人性化与亲和力。相比之下,深层定制需要将价值观、传统习俗、宗教信仰、历史背景及文化叙事等深层文化元素融入聊天机器人设计。这种人机交互领域的深层定制研究尚不充分,部分原因在于现有理论(如 HAII-TIME 模型的线索效应理论,Sundar 2020)主要关注表层线索与启发式方法,而非探究根植于文化深处的价值观与信仰如何促进多元群体间的信任与参与。 通过将文化适配性的两个方面整合到 GenAI 与人类的互动中,我们的研究有望提升多民族社区的用户参与度和防灾准备效果。鉴于目前缺乏关于文化定制化 GenAI 聊天机器人在防灾准备中的研究,我们提出以下研究问题,以探索其对不同社区对聊天机器人感知及后续灾害相关结果的影响:
RQ2: How does a culturally tailored GenAI Chatbot influence diverse residents’ chatbot perceptions (perceived friendliness and credibility)?
RQ2:文化定制的 GenAI 聊天机器人如何影响不同居民的聊天机器人感知(感知友好度和可信度)?RQ3: How does a culturally tailored GenAI Chatbot influence diverse residents’ disaster preparedness outcomes (information-seeking intention, sharing intention, and preparedness) through chatbot perceptions (perceived friendliness and credibility)?
RQ3:文化定制的生成式 AI 聊天机器人如何通过用户对聊天机器人的感知(感知友好度和可信度)影响不同居民在灾害准备方面的结果(信息寻求意向、分享意向及准备行为)?
Methods 方法
Following the approval of the University Institutional Review Board, we conducted an online between-subjects experiment from February to March 2024. Participants aged 18 and older in Florida were recruited through Prolific, an online research platform. Once enrolled, participants were directed to a Qualtrics survey and informed about using a GPT-4 chatbot, simulating a scenario of a local government’s hurricane preparedness education. Consented participants first completed screening and indicated their primary racial/ethnic identity. After reviewing the instructions, participants were randomly assigned via Qualtrics to one of the chatbot conditions programmed to generate unique uniform resource locators (URLs) for different conditions. Clicking on the assigned URLs redirected participants to our web browser interface to start their interactions with designated chatbots. Afterward, they were instructed to return to Qualtrics to complete the survey by answering questions measuring the outcome and demographic variables. At the end, participants were debriefed, compensated, and provided with links to emergency response resources from federal and state emergency response agencies.
在获得大学机构审查委员会批准后,我们于 2024 年 2 月至 3 月开展了一项在线组间实验。通过在线研究平台 Prolific 招募了佛罗里达州 18 岁及以上的参与者。注册后,参与者被引导至 Qualtrics 调查问卷,并获知将使用 GPT-4 聊天机器人模拟地方政府飓风防范教育场景。同意参与的受试者首先完成筛选并标明其主要种族/族裔身份。在阅读说明后,Qualtrics 系统将参与者随机分配至预设不同条件的聊天机器人组(各条件通过生成唯一统一资源定位符 URL 实现编程)。点击分配到的 URL 会将参与者重定向至我们的网页浏览器界面,开始与指定聊天机器人互动。随后,受试者需返回 Qualtrics 完成测量结果变量和人口统计变量的问卷调查。实验结束时,参与者接受任务简报、获得报酬,并收到来自联邦及州应急响应机构的紧急救援资源链接。
Participants 参与者
We used quota sampling to represent Florida residents by race/ethnicity, intentionally oversampling minorities, including Blacks and Hispanics, to achieve an equal distribution among different racial/ethnic groups in our sample. This allowed us to understand chatbot utilization for hurricane preparedness among vulnerable communities. The final sample size was 441 after removing 11 responses that failed all attention check questions. A comparison of the demographics between the final sample and the excluded responses did not reveal any patterns associated with attention check failure. In the final sample, the average age was 38.38 (SD = 14.08). The sample consisted of 148 (33.56%) White/Caucasian, 150 (34.01%) Black/African American, and 143 (32.43%) Hispanic/Latino participants. Within the sample, 62.36% were women, 34.47% were men, and 3.17% reported non-binary. Approximately 18.37% had high school or lower education levels, 70.98% had a partial or full college education, and 10.66% had a graduate degree or above. The median household income was between $25,000 to $49,999.
我们采用配额抽样方法以确保样本中佛罗里达州居民按种族/族裔分布的代表性,并有意对包括黑人和西班牙裔在内的少数群体进行过采样,从而实现不同种族/族裔群体在样本中的均衡分布。这一策略有助于我们理解脆弱社区在飓风防备中对聊天机器人使用的情况。在剔除 11 份未能通过所有注意力检测问题的回答后,最终样本量为 441 份。对比最终样本与被排除回答的人口统计学特征,未发现与注意力检测失败相关的特定模式。最终样本中,平均年龄为 38.38 岁(标准差=14.08)。样本构成包括 148 名(33.56%)白人/高加索人、150 名(34.01%)黑人/非洲裔美国人和 143 名(32.43%)西班牙裔/拉丁裔参与者。样本中 62.36%为女性,34.47%为男性,3.17%自我报告为非二元性别。约 18.37%的参与者教育程度为高中或以下,70.98%具有部分或完整的大学教育背景,10.66%拥有研究生及以上学历。家庭收入中位数介于 25,000 至 49,999 美元之间。
OpenAI’s GPT-4 API and web server
OpenAI 的 GPT-4 API 及网络服务器
Our chatbot utilizes OpenAI’s Chat Completions API, leveraging GenAI capabilities of an internally trained model designated as “gpt-4-1106-preview.” It is important to understand the distinction between GPT-4 API and ChatGPT. GPT-4 API provides a scalable and flexible framework that allows for programming-based customization of multiple chatbots, enabling their interactions with thousands of users simultaneously. This chatbot was developed from the open-source “openaichatexample” project on GitHub. We enhanced it by adding adaptive prompts for various experimental conditions, logging chat histories and timestamps for analysis, configuring the web server for load handling and pressure tests and managing daily data backup during deployment (for technical details, see Supplementary material P1). Our work selected GPT as the AI engine for our chatbot due to its superior performance compared to other large language models, as validated by extensive computing and NLP experiments (Duan et al., 2024; Ziems et al., 2024). The source code of our chatbot and its user instructions are available at: https://bitbucket.org/leecwwong/ai_chatbot/.
我们的聊天机器人采用 OpenAI 的 Chat Completions API,依托内部训练模型"gpt-4-1106-preview"的生成式人工智能能力。需明确区分 GPT-4 API 与 ChatGPT:GPT-4 API 提供可扩展的灵活框架,支持通过编程定制多个聊天机器人,使其能同时与数千用户交互。该机器人基于 GitHub 开源项目"openaichatexample"开发,我们通过以下方式增强其功能:为不同实验条件添加自适应提示、记录聊天历史与时间戳以供分析、配置网络服务器以处理负载及压力测试、并在部署期间管理每日数据备份(技术细节详见补充材料 P1)。我们选择 GPT 作为聊天机器人的 AI 引擎,因其经大量计算与自然语言处理实验验证(Duan 等,2024;Ziems 等,2024),性能优于其他大型语言模型。机器人源代码及用户指南详见:https://bitbucket.org/leecwwong/ai_chatbot/。
GenAI chatbot manipulation
生成式人工智能聊天机器人操控
We developed different sets of system prompts as input for OpenAI’s GPT-4 API, aiming to train different versions of GenAI chatbots. Initially, we created a general prompt where GPT 4 was instructed to simulate an agent from a Florida emergency management team providing reliable information as a member of a government agency for participants across all conditions. We then created four specific sets of prompts as manipulations of the chatbot’s tone (formal vs. informal) and cultural tailoring (tailored vs. generic). Building on the literature (Gretry et al., 2017), the tone was trained to vary between an informal tone—characterized by casual language, acronyms, and emojis when appropriate—and a formal tone, characterized by official and authoritative language representing an official agency (for details, see Supplementary material P2).
我们开发了多组不同的系统提示作为 OpenAI GPT-4 API 的输入,旨在训练不同版本的生成式人工智能聊天机器人。首先,我们设计了一个通用提示,指导 GPT-4 模拟佛罗里达州应急管理团队的一名成员,作为政府机构代表为所有实验条件下的参与者提供可靠信息。随后,我们通过操控聊天机器人的语气(正式与非正式)和文化适配性(定制化与通用化),创建了四组特定提示。基于现有文献(Gretry 等,2017),我们将语气训练分为非正式风格——采用随意语言、适当使用缩略词和表情符号——与正式风格,后者使用代表官方机构的权威性语言(详见补充材料 P2)。
Following the literature on cultural tailoring (Huang & Shen, 2016), the culturally tailored chatbot adapted its conversation based on the participant’s race/ethnicity (a dynamic variable determined by their input). For example, this chatbot was prompted to “adapt your language, tone, slang, acronyms, emojis, and other textual cues as appropriate based on the %race%,” and “provide credible and accurate information, knowledge, and/or support that addresses the common needs and challenges faced by the %race% communities in Florida” (for details, see Supplementary material P2). With Hispanic and Black participants, it used a culturally familiar name, provided bilingual support (Hispanic only), suggested the inclusion of culturally relevant items in the emergency kit, acknowledged the unique needs and concerns of the community, and emphasized family unity and cultural preservation. In contrast, the generic chatbot provided hurricane preparation information without cultural tailoring. Note that cultural tailoring was not applied for White/Caucasian participants, who all received the generic condition, as disaster preparedness information predominantly addresses the needs and perspectives of this majority group, inherently aligning with their cultural background and serving as the baseline for comparison. Examples of chat scripts are included in Supplementary material P3.
根据文化适应性相关文献(Huang & Shen, 2016),这款经过文化调适的聊天机器人能基于参与者的种族/民族(通过输入动态确定的变量)调整对话内容。例如,该机器人被设定为"根据%race%适当调整语言风格、语气、俚语、缩略词、表情符号及其他文本提示",并"提供可信且准确的信息、知识及/或支持,解决佛罗里达州%race%群体面临的共同需求与挑战"(详见补充材料 P2)。面对西班牙裔和非裔参与者时,它会使用文化熟悉的称呼、提供双语支持(仅限西班牙裔)、建议在应急包中加入文化相关物品、认可该群体的特殊需求与关切,并强调家庭凝聚力和文化传承。相比之下,通用版聊天机器人仅提供未经文化调适的飓风防范信息。 需要注意的是,文化适应性调整并未应用于白人/高加索裔参与者,他们均接受通用条件,因为灾害准备信息主要针对这一多数群体的需求和视角,自然与其文化背景相契合,并作为比较的基准。聊天脚本示例见补充材料 P3。
GenAI chatbot–user interaction process
生成式 AI 聊天机器人与用户互动流程
Participants were invited to evaluate a prototype GenAI chatbot for disaster preparedness education. They were instructed to imagine that a hurricane was forecasted for the upcoming weeks and interact with the chatbot to learn more about hurricane preparedness. Participants were randomly assigned to chatbot conditions on Qualtrics through a two-step process. First, a random number generator determined whether they would interact with a chatbot using an informal or formal tone. Then, a Qualtrics randomizer function assigned participants to either a culturally tailored chatbot, matching their primary racial/ethnic identity, or a non-tailored control chatbot (for technical details, see Supplementary material P4). Supplementary Table S4.2 confirms a successful randomization check. A custom URL, incorporating these assignments, directed participants to the appropriate web browser-based interface for their interaction. The web server received the encoded assigned condition through the custom URL and transmitted the relevant prompts to OpenAI’s GPT-4 API at the backend. This ensured that participants experienced interactions aligned with their assigned conditions without seeing the actual prompts. Each participant’s message and the entire chat history are sent to GPT-4 for reasoning and response. If multiple messages are sent before a reply, GPT-4 addresses them collectively. Participants were asked to interact with the chatbot for at least five minutes, and our server logged their chat history and response times.
参与者受邀评估一款用于灾害防备教育的生成式 AI 聊天机器人原型。他们被指示设想未来几周将有飓风来袭,并通过与聊天机器人互动来学习更多关于飓风防备的知识。通过 Qualtrics 平台的两步随机分配流程,参与者被分配到不同的聊天机器人条件组。首先,随机数生成器决定他们与采用非正式或正式语气的聊天机器人互动;随后,Qualtrics 随机分配功能将参与者分配至与其主要种族/族裔身份匹配的文化定制聊天机器人组,或非定制的对照组聊天机器人(技术细节见补充材料 P4)。补充表 S4.2 证实了随机化分组的有效性。包含这些分配信息的定制 URL 将参与者引导至相应的基于网页浏览器的交互界面。网络服务器通过定制 URL 接收编码后的分配条件,并在后端向 OpenAI 的 GPT-4 API 传输相关提示词。 这确保了参与者在未看到实际提示的情况下,体验与其分配条件相符的互动。每位参与者的消息及完整聊天记录均被发送至 GPT-4 进行推理与回复。若在收到回复前发送了多条消息,GPT-4 会一并处理。参与者被要求与聊天机器人互动至少五分钟,我们的服务器记录了他们的聊天历史及响应时间。
Pilot test 试点测试
A pilot test (n = 40) was conducted to ensure the web server’s functionality and the manipulation’s appropriateness. Participants from various racial communities responded favorably to the chatbot, supporting its appropriateness and utility. Their feedback led to adjustments in our prompts for the main study. These adjustments focused on prioritizing the provision of information over making promises about actions, such as dialing 911 or contacting hotlines, starting interactions in English before presenting options for additional languages, and changing language when requested.
为验证网络服务器的功能及操作设计的适宜性,我们进行了预测试(样本量 n=40)。来自不同种族群体的参与者对聊天机器人反应积极,支持了其适用性和实用性。根据反馈,我们调整了主研究中的提示语,重点包括:优先提供信息而非承诺采取行动(如拨打 911 或联系热线)、互动开始时默认使用英语再提供其他语言选项、以及根据请求切换语言。
Measurement 测量方法
Bot credibility 机器人可信度
We used five items from Flanagin and Metzger (2003) to measure bot credibility: “To the best of your judgment, is the chatbot [believable, accurate, trustworthy, biased, or complete]?” (1 = not at all, 7 = very much). The mean was 5.92 (SD = 1.00, Cronbach’s alpha = .88).
采用 Flanagin 与 Metzger(2003)的五个条目测量机器人可信度:“根据您的判断,该聊天机器人是否[可信、准确、可靠、有偏见或全面]?”(1=完全不,7=非常)。平均得分为 5.92(标准差=1.00,克隆巴赫α系数=.88)。
Bot friendliness 机器人友好度
We used five items from Koh and Sundar (2010) to measure bot friendliness: “To me, is the chatbot [empathetic, personal, warm, willing to listen, or open]?” (1 = not at all, 7 = very much). The mean was 5.30 (SD = 1.31, alpha = .91).
我们采用 Koh 和 Sundar(2010)的五项指标测量机器人友好度:“对我而言,该聊天机器人是否[具有同理心、个性化、热情、愿意倾听或开放]?”(1=完全不,7=非常符合)。平均得分为 5.30(标准差=1.31,α系数=.91)。
Information seeking intention
信息寻求意向
Adapted from Zhao and Tsang’s (2021) scale, participants reported the likelihood they would seek hurricane preparation information from state government agencies and local government agencies on a 7-point scale ranging from 1 “not likely at all,” to 7 “extremely likely.” The mean of information seeking intention was 5.36 (SD = 1.02, alpha = .75).
改编自 Zhao 和 Tsang(2021)的量表,参与者以 7 分制(1=“完全不可能”,7=“极有可能”)报告他们从州政府机构和地方政府机构获取飓风准备信息的可能性。信息寻求意向的平均值为 5.36(标准差=1.02,α系数=.75)。
Information sharing intention
信息分享意愿
Following Zhao and Tsang (2021), participants also indicated the extent to which they were likely to share the information provided with people they know, such as family, friends, and co-workers (M = 5.54, SD = 1.28, alpha = .80).
参照 Zhao 和 Tsang(2021)的研究方法,参与者还需评估其向亲友、同事等熟人分享所获信息的可能性(平均值=5.54,标准差=1.28,α系数=0.80)。
Disaster preparedness 灾害准备
Disaster preparedness measures were adapted from McNeill et al. (2013) and McLennan et al. (2020). Participants rated their level of agreement with six statements on a 7-point Likert scale. Example statements included “If a hurricane takes place, I know how to take proper actions to ensure my family’s and my safety” and “I feel confident about protecting my family and me against the negative impact of a hurricane.” (M = 5.75, SD = 0.85, alpha = .86). One item “I do not feel anxious even with an impending hurricane affecting my area” was removed due to its lower internal consistency in the sample.
灾害准备措施改编自 McNeill 等人(2013)及 McLennan 等人(2020)的研究。参与者以 7 级李克特量表对六项陈述的认同程度进行评分,示例陈述包括“若飓风发生,我知道如何采取正确行动保障家人与自身安全”及“我对保护家人和自己免受飓风负面影响充满信心”(平均值=5.75,标准差=0.85,α系数=0.86)。其中“即使飓风即将影响本地区,我也不会感到焦虑”一项因样本内部一致性较低而被剔除。
Covariate 协变量
Disaster experience, educational level, and racial identification were used as covariates. Disaster experience was measured through a binary variable indicating whether participants had previously experienced any hurricanes. To measure participants’ identification with their racial/ethnic group, we adopted three items from Leach et al. (2008). For instance, participants rated the statement “The fact that I am a [Hispanic/Latino] is an important part of my identity” on a 7-point Likert scale (M = 4.88, SD = 1.75, alpha = .90). The term within brackets corresponded to the participant’s self-reported racial/ethnic identity.
灾难经历、教育水平和种族认同被用作协变量。灾难经历的测量通过一个二元变量进行,该变量指示参与者是否曾经历过飓风。为衡量参与者对其种族/族裔群体的认同程度,我们采用了 Leach 等人(2008)研究中的三个项目。例如,参与者需对“作为[西班牙裔/拉丁裔]这一事实是我身份的重要组成部分”这一陈述进行 7 点李克特量表评分(M=4.88,SD=1.75,α=0.90)。方括号内的术语对应于参与者自我报告的种族/族裔身份。
Analytical scheme 分析方案
An exploratory computational analysis was first conducted on all chat scripts to identify the prevalent topics, providing the context for subsequent statistical analyses (for details, see Supplementary material P6). To test the hypotheses and research questions on chatbot perceptions and disaster outcomes, we conducted structural equation modeling through the R “Lavaan” package. Two models were analyzed, including the overall sample (N = 441) and a sub-sample comprising Hispanic and Black participants (n = 293), which provided a more focused test of the effect of cultural tailoring. In the structural model, exogenous variables were the two manipulated chatbot conditions coded as binary variables and covariates (not shown in the figure for simplicity). Mediators included perceived cultural tailoring, bot friendliness, and bot credibility. Endogenous variables included information-seeking intention, information-sharing intention, and disaster preparedness (Figure 1). In the measurement model, latent constructs with fewer than three items were identified through all items. To identify latent constructs with more than three items, a parceling approach was used to create composite items (see notes of Figure 1 for details). Table 1 shows descriptive statistics and a correlation matrix of these variables for the full sample. Parameters were estimated by maximum likelihood. The model was evaluated using standard cutoff values for the model-data fit indices (Hu & Bentler, 1999). The bootstrap method (N = 5,000, biased corrected) was used to estimate indirect effects.
首先对所有聊天脚本进行了探索性计算分析,以确定普遍讨论的主题,为后续统计分析提供背景(详见补充材料 P6)。为验证关于聊天机器人感知与灾害结果的假设和研究问题,我们通过 R 语言中的“Lavaan”包进行了结构方程建模。分析包含两个模型:全样本(N=441)和由西班牙裔与非裔参与者构成的子样本(n=293),后者能更集中地检验文化适配效果。在结构模型中,外生变量为编码为二元变量的两种受控聊天机器人条件及协变量(为简化图示未展示)。中介变量包括感知文化适配性、机器人友好度和可信度。内生变量则涵盖信息寻求意愿、信息分享意愿及灾害准备程度(图 1)。测量模型中,少于三个项目的潜变量通过所有项目进行识别。 为识别包含三个以上项目的潜在构念,本研究采用项目组合法创建复合项目(详见图 1 注释)。表 1 展示了全样本下这些变量的描述性统计及相关矩阵。参数通过最大似然法进行估计,并采用 Hu 与 Bentler(1999)提出的模型-数据拟合指数标准临界值评估模型。间接效应估计采用偏差校正的 Bootstrap 法(N=5,000 次)。

Full sample results from SEM.
Note. N = 441. ***p < .001, **p < .01, *p < .05. Control variables include disaster experience, education, and racial identification. Hurricane preparedness was identified by two composite items. The first composite item comprised the average of the following three items: “If a hurricane takes place, I know how to take proper actions to ensure my family’s and my safety,” “If a hurricane occurs, I know how to prepare my properties to minimize the physical damage of the hurricane on my properties,” and “I feel confident about protecting my family and I against the negative impact of a hurricane.” The second composite item comprised the average of the following two items: “I have the ability needed to cope with an incoming hurricane,” and “I can access all the resources needed to cope with an incoming hurricane.”
注:样本量 N = 441。***p < .001 表示极显著,**p < .01 表示高度显著,*p < .05 表示显著。控制变量包括灾害经历、教育程度及种族认同。飓风防备程度通过两个复合指标衡量:第一项复合指标由以下三个条目的平均值构成——“若飓风发生,我知晓如何采取正确行动保障家人与自身安全”“若飓风来临,我懂得如何保护财产以减轻飓风对房屋的物理损害”“我对保护家人和自己免受飓风负面影响充满信心”;第二项复合指标则整合了以下两个条目的均值——“我具备应对即将来临飓风的能力”“我能获取应对飓风所需的一切资源”。
图 1. SEM 全样本结果
Mean (SD) 均值(标准差) . | Manip Tone 操纵音调 . | MC Tone 主控音调 . | Manip Tailor 操纵裁缝 . | MC Tailor MC 裁缝 . | Racial Identity 种族认同 . | Experience . | Bot Friendliness . | Bot Credibility . | Info Share . | Info Seek . | Preparedness . | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Manip Tone 操控语气 | 0.51 (0.50) | 1 | 0.57*** | 0.01 | −0.04 | 0.02 | −0.01 | −0.31*** | 0.06 | 0.03 | −0.03 | −0.03 |
MC Tone 主控音色 | 0.39 (0.49) | 0.57*** | 1 | 0.03 | 0.04 | 0.05 | −0.07 | −0.27*** | 0.10* | 0.01 | 0.02 | −0.06 |
Manip Tailor 操控裁缝 | 0.52 (0.50) | 0.01 | 0.03 | 1 | 0.10* | 0.06 | 0.02 | −0.03 | −0.09 | −0.02 | 0.03 | 0.01 |
MC Tailor 主控制裁 | 3.66 (1.10) | −0.04 | 0.04 | 0.10* | 1 | 0.01 | 0.01 | 0.08 | 0.15** | 0.13** | 0.01 | 0.03 |
Racial Identity 种族身份 | 4.84 (1.62) | 0.02 | 0.05 | 0.06 | 0.01 | 1 | −0.09 | 0.19*** | 0.08 | 0.16** | 0.19*** | −0.02 |
Experience 经历 | 0.94 (0.24) | −0.01 | −0.07 | 0.02 | 0.01 | −0.09 -0.09 | 1 | −0.06 | −0.01 | −0.08 | −0.10* | 0.11* |
Bot Friendliness 机器人友好度 | 5.30 (1.31) | −0.31*** -0.31*** | −0.27*** -0.27*** | −0.03 -0.03 | 0.08 | 0.19*** | −0.06 | 1 | 0.57*** | 0.39*** | 0.25*** | 0.18*** |
Bot Credibility 机器人可信度 | 5.92 (1.00) | 0.06 | 0.10* | −0.09 -0.09 | 0.15** | 0.08 | −0.01 | 0.57*** | 1 | 0.38*** | 0.26*** | 0.27*** |
Info Share 信息分享 | 5.54 (1.28) | 0.03 | 0.01 | −0.02 -0.02 | 0.13** | 0.16** | −0.08 | 0.39*** | 0.38*** | 1 | 0.43*** | 0.17*** |
Info Seek 信息寻求 | 5.36 (1.02) | −0.01 -0.01 | 0.02 | 0.03 | 0.01 | 0.19*** | −0.10* | 0.25*** | 0.26*** | 0.43*** | 1 | 0.13** |
Preparedness 准备状态 | 5.75 (0.85) | −0.03 -0.03 | −0.06 -0.06 | 0.01 | 0.03 | −0.02 -0.02 | 0.11* | 0.18*** | 0.27*** | 0.17*** | 0.13** | 1 |
Mean (SD) . | Manip Tone . | MC Tone . | Manip Tailor . | MC Tailor . | Racial Identity . | Experience . | Bot Friendliness . | Bot Credibility . | Info Share . | Info Seek . | Preparedness . | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Manip Tone | 0.51 (0.50) | 1 | 0.57*** | 0.01 | −0.04 | 0.02 | −0.01 | −0.31*** | 0.06 | 0.03 | −0.03 | −0.03 |
MC Tone | 0.39 (0.49) | 0.57*** | 1 | 0.03 | 0.04 | 0.05 | −0.07 | −0.27*** | 0.10* | 0.01 | 0.02 | −0.06 |
Manip Tailor | 0.52 (0.50) | 0.01 | 0.03 | 1 | 0.10* | 0.06 | 0.02 | −0.03 | −0.09 | −0.02 | 0.03 | 0.01 |
MC Tailor | 3.66 (1.10) | −0.04 | 0.04 | 0.10* | 1 | 0.01 | 0.01 | 0.08 | 0.15** | 0.13** | 0.01 | 0.03 |
Racial Identity | 4.84 (1.62) | 0.02 | 0.05 | 0.06 | 0.01 | 1 | −0.09 | 0.19*** | 0.08 | 0.16** | 0.19*** | −0.02 |
Experience | 0.94 (0.24) | −0.01 | −0.07 | 0.02 | 0.01 | −0.09 | 1 | −0.06 | −0.01 | −0.08 | −0.10* | 0.11* |
Bot Friendliness | 5.30 (1.31) | −0.31*** | −0.27*** | −0.03 | 0.08 | 0.19*** | −0.06 | 1 | 0.57*** | 0.39*** | 0.25*** | 0.18*** |
Bot Credibility | 5.92 (1.00) | 0.06 | 0.10* | −0.09 | 0.15** | 0.08 | −0.01 | 0.57*** | 1 | 0.38*** | 0.26*** | 0.27*** |
Info Share | 5.54 (1.28) | 0.03 | 0.01 | −0.02 | 0.13** | 0.16** | −0.08 | 0.39*** | 0.38*** | 1 | 0.43*** | 0.17*** |
Info Seek | 5.36 (1.02) | −0.01 | 0.02 | 0.03 | 0.01 | 0.19*** | −0.10* | 0.25*** | 0.26*** | 0.43*** | 1 | 0.13** |
Preparedness | 5.75 (0.85) | −0.03 | −0.06 | 0.01 | 0.03 | −0.02 | 0.11* | 0.18*** | 0.27*** | 0.17*** | 0.13** | 1 |
Note. N = 441. 注:样本量 N=441。
p < .001. p < .001。
p < .01. p < .01。
p < .05. p < .05。
Mean (SD) 均值(标准差) . | Manip Tone 操纵音调 . | MC Tone 主控音调 . | Manip Tailor 操纵裁缝 . | MC Tailor 主控制裁 . | Racial Identity 种族认同 . | Experience 经验 . | Bot Friendliness 机器人友好度 . | Bot Credibility . | Info Share . | Info Seek . | Preparedness . | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Manip Tone 操纵音调 | 0.51 (0.50) | 1 | 0.57*** | 0.01 | −0.04 | 0.02 | −0.01 -0.01 | −0.31*** -0.31*** | 0.06 | 0.03 | −0.03 | −0.03 |
MC Tone 主控音调 | 0.39 (0.49) | 0.57*** | 1 | 0.03 | 0.04 | 0.05 | −0.07 | −0.27*** | 0.10* | 0.01 | 0.02 | −0.06 |
Manip Tailor 操纵裁缝 | 0.52 (0.50) | 0.01 | 0.03 | 1 | 0.10* | 0.06 | 0.02 | −0.03 | −0.09 | −0.02 | 0.03 | 0.01 |
MC Tailor 主控裁缝 | 3.66 (1.10) | −0.04 | 0.04 | 0.10* | 1 | 0.01 | 0.01 | 0.08 | 0.15** | 0.13** | 0.01 | 0.03 |
Racial Identity 种族认同 | 4.84 (1.62) | 0.02 | 0.05 | 0.06 | 0.01 | 1 | −0.09 -0.09 | 0.19*** | 0.08 | 0.16** | 0.19*** | −0.02 |
Experience 经验 | 0.94 (0.24) | −0.01 -0.01 | −0.07 | 0.02 | 0.01 | −0.09 -0.09 | 1 | −0.06 -0.06 | −0.01 | −0.08 | −0.10* | 0.11* |
Bot Friendliness 机器人友好度 | 5.30 (1.31) | −0.31*** -0.31*** | −0.27*** -0.27*** | −0.03 -0.03 | 0.08 | 0.19*** | −0.06 -0.06 | 1 | 0.57*** | 0.39*** | 0.25*** | 0.18*** |
Bot Credibility 机器人可信度 | 5.92 (1.00) | 0.06 | 0.10* | −0.09 -0.09 | 0.15** | 0.08 | −0.01 -0.01 | 0.57*** | 1 | 0.38*** | 0.26*** | 0.27*** |
Info Share 信息分享 | 5.54 (1.28) | 0.03 | 0.01 | −0.02 | 0.13** | 0.16** | −0.08 | 0.39*** | 0.38*** | 1 | 0.43*** | 0.17*** |
Info Seek 信息寻求 | 5.36 (1.02) | −0.01 -0.01 | 0.02 | 0.03 | 0.01 | 0.19*** | −0.10* -0.10* | 0.25*** | 0.26*** | 0.43*** | 1 | 0.13** |
Preparedness 准备状态 | 5.75 (0.85) | −0.03 -0.03 | −0.06 -0.06 | 0.01 | 0.03 | −0.02 -0.02 | 0.11* | 0.18*** | 0.27*** | 0.17*** | 0.13** | 1 |
Mean (SD) . | Manip Tone . | MC Tone . | Manip Tailor . | MC Tailor . | Racial Identity . | Experience . | Bot Friendliness . | Bot Credibility . | Info Share . | Info Seek . | Preparedness . | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Manip Tone | 0.51 (0.50) | 1 | 0.57*** | 0.01 | −0.04 | 0.02 | −0.01 | −0.31*** | 0.06 | 0.03 | −0.03 | −0.03 |
MC Tone | 0.39 (0.49) | 0.57*** | 1 | 0.03 | 0.04 | 0.05 | −0.07 | −0.27*** | 0.10* | 0.01 | 0.02 | −0.06 |
Manip Tailor | 0.52 (0.50) | 0.01 | 0.03 | 1 | 0.10* | 0.06 | 0.02 | −0.03 | −0.09 | −0.02 | 0.03 | 0.01 |
MC Tailor | 3.66 (1.10) | −0.04 | 0.04 | 0.10* | 1 | 0.01 | 0.01 | 0.08 | 0.15** | 0.13** | 0.01 | 0.03 |
Racial Identity | 4.84 (1.62) | 0.02 | 0.05 | 0.06 | 0.01 | 1 | −0.09 | 0.19*** | 0.08 | 0.16** | 0.19*** | −0.02 |
Experience | 0.94 (0.24) | −0.01 | −0.07 | 0.02 | 0.01 | −0.09 | 1 | −0.06 | −0.01 | −0.08 | −0.10* | 0.11* |
Bot Friendliness | 5.30 (1.31) | −0.31*** | −0.27*** | −0.03 | 0.08 | 0.19*** | −0.06 | 1 | 0.57*** | 0.39*** | 0.25*** | 0.18*** |
Bot Credibility | 5.92 (1.00) | 0.06 | 0.10* | −0.09 | 0.15** | 0.08 | −0.01 | 0.57*** | 1 | 0.38*** | 0.26*** | 0.27*** |
Info Share | 5.54 (1.28) | 0.03 | 0.01 | −0.02 | 0.13** | 0.16** | −0.08 | 0.39*** | 0.38*** | 1 | 0.43*** | 0.17*** |
Info Seek | 5.36 (1.02) | −0.01 | 0.02 | 0.03 | 0.01 | 0.19*** | −0.10* | 0.25*** | 0.26*** | 0.43*** | 1 | 0.13** |
Preparedness | 5.75 (0.85) | −0.03 | −0.06 | 0.01 | 0.03 | −0.02 | 0.11* | 0.18*** | 0.27*** | 0.17*** | 0.13** | 1 |
Note. N = 441. 注:样本量 N=441。
p < .001. p < .001。
p < .01. p < .01。
p < .05. p < .05。
Results 结果
Manipulation checks 操作检验
On average, participants interacted with the chatbot for 6 min and 42 s (SD = 4 min and 13 s). In total, participants inputted 3,615 textual entries, while the GenAI chatbots inputted 4,233 entries. Given potential randomness in texts generated by GenAI chatbots, we conducted two sets of manipulation checks testing both actual and perceived tone formality and cultural tailoring. First, we measured the actual levels of linguistic manipulation using a computational analysis of all chat scripts through OpenAI’s GPT-4 model (see Supplementary material P5 for the prompts). The actual level of cultural tailoring in the chatbot text (0–5) was measured by summing five binary indicators, such as a culturally familiar agent name or proposed language options (for details, see Supplementary material P5). The actual level of tone informality was measured based on the ratio of colloquial words, slang, acronyms, emojis, and emoticons in the text. Our independent-sample t-tests confirmed the effectiveness of manipulations in actual texts: t (443) = 21.46, p < .001, Cohen’s d = 2.01 for tone informality, and t (294) = −21.80, p < .001, Cohen’s d = −2.57 for cultural tailoring. The ratio of colloquial words in the informal tone condition was 18.9%, compared to 0.3% in the formal tone condition. The cultural tailoring score was 3.76 in the tailored condition, compared to 0.86 in the generic condition.
参与者平均与聊天机器人互动时长为 6 分 42 秒(标准差=4 分 13 秒)。总计,参与者输入了 3,615 条文本,而生成式 AI 聊天机器人则输入了 4,233 条。鉴于生成式 AI 聊天机器人可能产生的文本随机性,我们进行了两组操纵检验,分别测试实际与感知的语气正式度及文化适配性。首先,我们通过 OpenAI 的 GPT-4 模型对所有聊天脚本进行计算机分析,测量了语言操纵的实际水平(具体提示词见补充材料 P5)。聊天机器人文本中的实际文化适配度(0-5 分)通过累加五个二元指标来评估,如使用文化熟悉的代理名称或提供语言选项(详情见补充材料 P5)。语气非正式度的实际水平则依据文本中口语词、俚语、缩略语、表情符号及颜文字的比例来衡量。独立样本 t 检验证实了实际文本中操纵的有效性:语气非正式度 t(443)=21.46,p<.001,Cohen's d=2.01;文化适配度 t(294)=-21.80,p<.001,Cohen's d=-2.57。 非正式语气条件下口语词汇占比为 18.9%,而正式语气条件下仅为 0.3%。文化适配评分在定制化条件下达到 3.76 分,通用条件下则为 0.86 分。
To ensure the validity of GPT-4’s measures using human validation, three expert coders conducted binary coding of tone and cultural tailoring using 20 randomly sampled chat scripts (Krippendorf’s alpha: .95 for tone and .82 for cultural tailoring). The results showed strong Point-Biserial correlations between human judgments and GPT-4 measures, with 0.88 for tone and 0.73 for cultural tailoring (both p < .001), supporting the validity of GPT-4’s measures. Additionally, the GPT-4 measure of tone was validated against the LIWC dictionary, specifically the “Conversation” corpus (Boyd et al., 2022), using the full sample. The strong Pearson’s correlation of 0.73 (p < .001) between tone measured by GPT-4 and LIWC further supported the robustness of GPT-4’s tone measures.
为确保 GPT-4 测量指标的人类效度验证,三位专家编码员对 20 份随机抽样的聊天脚本进行了二元编码(语气测量的 Krippendorf's alpha 系数为 0.95,文化适配为 0.82)。结果显示人工判断与 GPT-4 测量值之间存在强点二列相关性,语气测量相关系数达 0.88,文化适配为 0.73(两者 p 值均<0.001),证实了 GPT-4 测量指标的有效性。此外,GPT-4 的语气测量结果还与 LIWC 词典(特别是 Boyd 等人 2022 年提出的"会话"语料库)进行了全样本比对验证。GPT-4 与 LIWC 语气测量结果间的皮尔逊相关系数高达 0.73(p<0.001),进一步佐证了 GPT-4 语气测量指标的稳健性。
The effectiveness of our manipulations was also supported using self-reported perceptions. For perceived tone, all participants indicated the chatbot’s communication style from three options: casual style, formal style, or do not remember. A significant majority (77.68%) correctly recognized the assigned style, indicating effective tone manipulation:
我们的操作有效性还通过自我报告的感知得到了支持。在感知语气方面,所有参与者从三个选项中指出了聊天机器人的沟通风格:随意风格、正式风格或不记得。绝大多数(77.68%)正确识别了所分配的风格,表明语气操纵有效:
Hypotheses and research questions
假设与研究问题
For the full sample, the model-data fit was satisfactory (Figure 1):
对于完整样本,模型与数据的拟合度令人满意(图 1):

The Hispanic and Black sample results from SEM.
Note. N = 293. ***p < .001, **p < .01, *p < .05. Control variables include disaster experience, education, and racial identification. Hurricane preparedness was identified by two composite items. The first composite item comprised the average of the following three items: “If a hurricane takes place, I know how to take proper actions to ensure my family’s and my safety,” “If a hurricane occurs, I know how to prepare my properties to minimize the physical damage of the hurricane on my properties,” and “I feel confident about protecting my family and I against the negative impact of a hurricane.” The second composite item comprised the average of the following two items: “I have the ability needed to cope with an incoming hurricane,” and “I can access all the resources needed to cope with an incoming hurricane.”
注:样本量 N=293。***p < .001,**p < .01,*p < .05。控制变量包括灾害经历、教育程度及种族认同。飓风防备程度通过两个复合指标衡量:第一个复合指标由以下三项陈述的平均值构成——“若飓风发生,我知道如何采取正确行动保障家人与自身安全”“若飓风来临,我懂得如何保护财产以减轻飓风对房屋的物理损害”“我对保护家人和自己免受飓风负面影响充满信心”;第二个复合指标由以下两项陈述的平均值构成——“我具备应对即将来临飓风的能力”“我能获取应对飓风所需的一切资源”。
图 2.西班牙裔和黑人样本的结构方程模型(SEM)结果。
H1 predicted that the conversational tone of the GenAI chatbot significantly predicted diverse residents’ chatbot perceptions, including perceived friendliness and credibility, and H2 predicted that the chatbot’s perceived friendliness was positively associated with perceived credibility. Our results showed that the manipulated tone formality negatively predicted the perceived chatbot friendliness (b = −1.00, SE = 0.13, p < .001) and positively predicted the perceived bot credibility (b = 0.58, SE = 0.10, p < .001), supporting H1. Perceived friendliness of the chatbot was positively associated with perceived credibility: b = 0.45, SE = 0.04, p < .001, confirming H2.
H1 假设 GenAI 聊天机器人的对话语气能显著预测不同居民对聊天机器人的感知,包括感知友好度和可信度;H2 则假设聊天机器人感知到的友好度与感知可信度呈正相关。我们的结果显示,操控的语气正式度负向预测了聊天机器人感知友好度(b =−1.00,SE =0.13,p < .001),正向预测了机器人感知可信度(b =0.58,SE =0.10,p < .001),支持 H1。聊天机器人的感知友好度与感知可信度呈正相关:b =0.45,SE =0.04,p < .001,验证了 H2。
H3 hypothesized that the perceived friendliness of chatbots positively related to disaster preparedness outcomes including information-seeking intention, sharing intention, and hurricane preparedness, and H4 hypothesized that the perceived credibility of chatbots positively related to these disaster preparedness outcomes. Perceived chatbot friendliness positively related to the intent to share information with family and friends (b = 0.18, SE = 0.04, p < .001), but not information seeking or hurricane preparedness. H3 was partially supported. Additionally, perceived chatbot credibility positively related to information-seeking intention (b = 0.29, SE = 0.06, p < .001), sharing intention (b = 0.35, SE = 0.07, p < .001), and hurricane preparedness (b = 0.19, SE = 0.05, p < .001). H4 was fully supported.
H3 假设认为,聊天机器人感知友好度与灾害准备结果(包括信息寻求意愿、分享意愿和飓风准备)呈正相关;H4 则假设聊天机器人感知可信度与这些灾害准备结果正相关。感知聊天机器人友好度与向家人朋友分享信息的意愿呈正相关(b=0.18,SE=0.04,p<.001),但与信息寻求或飓风准备无关,H3 得到部分支持。此外,感知聊天机器人可信度与信息寻求意愿(b=0.29,SE=0.06,p<.001)、分享意愿(b=0.35,SE=0.07,p<.001)及飓风准备(b=0.19,SE=0.05,p<.001)均呈正相关,H4 获得完全支持。
RQ2 asked how cultural tailoring of the interaction with AI chatbot affected diverse communities’ chatbot perceptions (perceived friendliness and credibility). Our results showed that manipulated cultural tailoring affected perceived cultural tailoring (b = 0.21, SE = 0.10, p = .040), which in turn positively affected perceived chatbot credibility (b = 0.14, SE = 0.04, p < .001). To validate the effect of cultural tailoring on chatbot perceptions, we also relied on the Hispanic and Black subsample. Figure 2 shows that cultural tailoring perceived by the Hispanic and Black participants significantly predicted perceived chatbot friendliness (b = 0.33, SE = 0.13, p = .008), which subsequently related to perceived bot credibility (b = 0.64, SE = 0.13, p < .001). Taken together, the findings suggested that perceived cultural tailoring predicted both perceptions differently for different racial communities.
RQ2 探讨了 AI 聊天机器人交互的文化定制如何影响不同社群对聊天机器人的感知(感知友好度与可信度)。结果显示,人为操控的文化定制影响了感知到的文化适配性(b=0.21,SE=0.10,p=0.040),进而正向作用于聊天机器人可信度感知(b=0.14,SE=0.04,p<0.001)。为验证文化定制对感知的影响,我们还分析了西班牙裔和非裔子样本。图 2 表明,这两个群体感知到的文化定制显著预测了聊天机器人友好度(b=0.33,SE=0.13,p=0.008),并进一步关联到机器人可信度(b=0.64,SE=0.13,p<0.001)。综合来看,研究发现感知文化定制对不同种族群体的两种感知存在差异化预测作用。
A series of indirect effects were tested to answer RQ1 and RQ3. For RQ1, there was a significant indirect effect between the manipulated tone formality of the chatbot and information-sharing intent (b = −0.18, SE = 0.048, 95% CI [−0.303, −0.080]) through bot friendliness. There was also a significant indirect effect from the manipulated tone formality to information sharing intent through bot credibility: b = 0.17, SE = 0.04, 95% CI [0.081, 0.287]. For RQ3, there were indirect effects from perceived cultural tailoring to information-seeking intent (b = 0.019, SE = 0.01, 95% CI [0.007, 0.044]), information-sharing intent (b = 0.033, SE = 0.01, 95% CI [0.010,0.073]), and preparedness (b = 0.022, SE = 0.01, 95% CI [0.007,0.047]) through bot credibility.
为回答研究问题 1(RQ1)和研究问题 3(RQ3),测试了一系列间接效应。针对 RQ1,通过聊天机器人友好度中介,操纵的聊天机器人语气正式度与信息分享意愿之间存在显著间接效应(b =−0.18,SE =0.048,95% CI [−0.303,−0.080])。同时,通过机器人可信度中介,从操纵的语气正式度到信息分享意愿也存在显著间接效应:b =0.17,SE =0.04,95% CI [0.081,0.287]。对于 RQ3,感知文化定制通过机器人可信度对信息寻求意愿(b =0.019,SE =0.01,95% CI [0.007,0.044])、信息分享意愿(b =0.033,SE =0.01,95% CI [0.010,0.073])及准备程度(b =0.022,SE =0.01,95% CI [0.007,0.047])产生间接效应。
Discussion 讨论
Building upon the CASA paradigm and the literature on disaster vulnerability and cultural tailoring, this study designed GenAI chatbots in the context of disaster preparedness, testing the roles of tone informality and cultural tailoring in organization-public disaster communication. Leveraging OpenAI’s GPT-4 API that offers a scalable and flexible framework for programming-based customization of chatbots, our results from an online experiment with diverse communities show that GPT-4 chatbots varying in tone and cultural tailoring can significantly affect the perceived friendliness and credibility of chatbots, which are subsequently related to hurricane preparedness outcomes, including information seeking intent, information sharing intent, and preparedness. These results are discussed in detail as follows.
基于 CASA 范式及灾害脆弱性与文化适应性相关文献,本研究在灾害准备背景下设计了 GenAI 聊天机器人,测试了语气非正式性和文化适应性在组织与公众灾害沟通中的作用。利用 OpenAI 的 GPT-4 API 提供的可扩展灵活框架进行基于编程的聊天机器人定制,我们通过多元社区在线实验发现,不同语气和文化适应性的 GPT-4 聊天机器人能显著影响用户对机器人友好度和可信度的感知,进而关联飓风防备效果——包括信息寻求意愿、信息分享意愿及防备行为。具体研究结果详述如下。
First, the SEM results show that the tone formality of GenAI chatbots was positively related to perceived bot credibility and negatively associated with perceived bot friendliness. This suggests the complex role of chatbot tone in disaster communication between an emergency management agency and diverse communities. On the one hand, a chatbot’s informal tone, as an important indicator of conversational human voice, increases perceived chatbot friendliness. On the other hand, a chatbot’s formal tone directly enhanced perceived source credibility, likely increasing information trustworthiness. Despite the inconsistency, our findings suggest that there was a positive net effect between tone formality and bot perceptions on hurricane preparedness outcomes. This thus suggests that from the standpoint of disaster management agencies, it is advised to predominantly use a formal tone while enhancing chatbot humanness through other appropriate elements, such as dialogic communication or humor. A contextually appropriate conversational tone can enhance perceived bot humanness without lowering trust or causing a backfire effect. This can potentially lead to higher information engagement in government–public disaster communication.
首先,SEM 结果表明,GenAI 聊天机器人的语气正式度与感知机器人可信度呈正相关,而与感知机器人友好度呈负相关。这表明在应急管理机构与多元社区间的灾害沟通中,聊天机器人语气扮演着复杂角色。一方面,非正式语气作为对话式人声的重要指标,能提升聊天机器人的感知友好度;另一方面,正式语气直接增强了信息来源可信度,可能提高信息可信性。尽管存在不一致性,我们的研究发现语气正式度与机器人感知对飓风防备效果存在正向净效应。因此建议灾害管理机构在保持主导性正式语气的同时,通过对话式交流或幽默等其他恰当元素增强聊天机器人的人性化特征。情境适配的对话语气可在不降低信任或引发反效果的前提下,有效提升机器人的感知人性化程度。 这可能会在政府与公众的灾害沟通中带来更高的信息参与度。
The open-ended responses from survey respondents further provided insights into the mixed results. When asked to reflect on the experience of interacting with the GenAI chatbots, while most participants approved the chatbot for providing useful and credible information, a few who were assigned to the informal tone condition indicated that the tone might be too casual to match the agency’s authority status. The perceived incongruency may elicit negative feelings toward the agency and the information obtained. Meanwhile, certain participants expressed favorable attitudes toward informal language cues, such as the use of emojis or the causal ways of greeting. The inconclusive findings thus invite future research to explore contextually and culturally appropriate ways to implement conversational human voice in the context of disasters.
调查对象的开放式回答进一步揭示了结果复杂的原因。当被要求反思与生成式 AI 聊天机器人互动的体验时,尽管多数参与者认可该机器人提供的信息有用且可信,但少数被分配到非正式语气条件下的受访者表示,这种语气可能过于随意,与机构的权威地位不符。这种感知上的不协调可能引发对机构及所获信息的负面情绪。与此同时,部分参与者对非正式语言线索(如使用表情符号或随意的问候方式)表达了积极态度。因此,这些尚无定论的发现呼吁未来研究探索在灾害背景下实施对话式人声时,如何在情境和文化上做到恰当适宜。
Additionally, different GenAI chatbot perceptions facilitated disaster preparedness outcomes in distinct ways. Specifically, perceived bot friendliness was only positively associated with disaster information sharing intention. In contrast, perceived bot credibility consistently increased all three forms of disaster preparedness outcomes. Findings reaffirm the importance of source credibility and trust, well established in the HCI literature (e.g., Johnson & Grayson, 2005). In disaster preparedness, the more credible individuals perceive the chatbots, the more likely they are motivated to take the next step of action to acquire and share information. Such credibility perception also translates into a higher level of disaster preparedness, which is consistent with existing disaster response literature in that trust and credibility perceptions of emergency management authorities can feed into greater disaster preparedness (e.g., Wachinger et al., 2013). In sum, our results suggest that enhancing the humanness of GenAI chatbots can improve public chatbot perceptions and engagement in the context of disaster preparedness, contributing to a community’s disaster resilience and preparedness.
此外,不同 GenAI 聊天机器人的感知以不同方式促进了灾害准备效果。具体而言,感知到的机器人友好性仅与灾害信息分享意愿呈正相关。相比之下,感知到的机器人可信度持续提升了所有三种灾害准备行为。研究结果再次印证了人机交互文献中已确立的信源可信度与信任的重要性(如 Johnson & Grayson, 2005)。在灾害准备中,个体认为聊天机器人越可信,就越有可能被激励采取获取和分享信息的下一步行动。这种可信度感知还转化为更高水平的灾害准备,这与现有灾害应对文献一致,即对应急管理机构的信任和可信度感知能促进更强的灾害准备(如 Wachinger 等,2013)。 总之,我们的研究结果表明,在灾害防备的背景下,增强生成式人工智能聊天机器人的人性化程度能够改善公众对其的认知与互动参与,从而提升社区的灾害韧性与准备能力。
Our results further underscore the potential of culturally tailored GenAI chatbots in disaster communication for multiethnic communities. Despite potential randomness in texts generated by GenAI chatbots compared to rule-based chatbots, our cultural tailoring manipulation successfully achieved both the actual degree of cultural tailoring in chat scripts and the perceived degree of cultural tailoring among Hispanic and Black participants. The robustness of our manipulation check indicates that cultural tailoring could be a promising strategy for enhancing GenAI chatbot communication with diverse cultural groups. Yet, the effect size of cultural tailoring was small, suggesting a discrepancy between the effectiveness of the intended and perceived cultural tailoring. This may be attributed to the use of similar system prompts across cultural groups: the lack of nuanced cultural tailoring might have diminished its effectiveness by unintentionally triggering stereotypes. Another reason could be our emphasis on surface-level tailoring more than deep-level tailoring, even though both were employed. Meta-analysis shows deep-level tailoring is typically more effective (Huang & Shen, 2016). To increase the effectiveness of culturally tailored chatbots in disaster communication, future research could develop distinct prompts for specific cultural groups, balancing generic disaster information with culturally relevant values and beliefs. For Black communities, it might be essential to incorporate church and faith-based messaging, emphasize community solidarity, and acknowledge historical contexts. For Hispanic/Latino communities, cultural tailoring could focus on family values and promote collectivism.
我们的研究结果进一步凸显了文化定制生成式人工智能聊天机器人在多民族社区灾害沟通中的潜力。尽管与基于规则的聊天机器人相比,生成式 AI 聊天机器人产生的文本可能存在随机性,但我们的文化定制操作成功实现了聊天脚本中实际的文化适配程度,并在西班牙裔和黑人参与者中获得了感知层面的文化适配认可。操作检验的稳健性表明,文化定制有望成为提升生成式 AI 聊天机器人与多元文化群体沟通的有效策略。然而,文化定制的效应量较小,这暗示了预期文化适配与感知文化适配之间存在差异。这可能归因于跨文化群体使用了相似的系统提示:缺乏细致入微的文化定制可能无意中触发了刻板印象,从而削弱了其效果。另一个原因可能是我们更侧重于表层定制而非深层定制,尽管两者均有采用。元分析显示深层定制通常更为有效(Huang & Shen, 2016)。 为提高文化定制聊天机器人在灾害沟通中的有效性,未来研究可为特定文化群体开发差异化提示,将通用灾害信息与文化相关价值观及信仰相平衡。针对黑人社区,融入教堂与信仰导向的信息、强调社区团结并承认历史背景可能至关重要。对于西班牙裔/拉丁裔社区,文化定制可侧重于家庭价值观并提倡集体主义。
Last, perceived cultural tailoring in the chatbot’s information enhanced its credibility, with slight variations across racial groups. Among Hispanic and Black participants, perceived cultural tailoring also improved perceptions of chatbot friendliness, which further reinforced credibility. Credible chatbots, in turn, contributed to higher information-seeking and sharing intentions, as well as better preparedness outcomes. The positive impact of cultural tailoring aligns with prior research on the significance of culture in health communication (Huang & Shen, 2016) by highlighting the necessity of culturally appropriate disaster communication technologies. It also provides the first empirical evidence supporting its beneficial impact in the context of human-AI communication in disasters.
最后,聊天机器人信息中感知到的文化定制增强了其可信度,且在不同种族群体间存在细微差异。在西班牙裔和黑人参与者中,感知到的文化定制还提升了聊天机器人友好度的评价,这进一步巩固了可信度。可信的聊天机器人反过来促进了更高的信息寻求与分享意愿,以及更好的应急准备效果。文化定制的积极影响与先前关于文化在健康传播中重要性的研究(Huang & Shen, 2016)相呼应,强调了开发符合文化特性的灾害传播技术的必要性。本研究还首次提供了实证证据,支持其在灾害情境下人类与人工智能交流中的积极作用。
Theoretical implications
理论意义
Our study offers significant theoretical implications by extending the CASA paradigm (Nass et al., 1994) to the contexts of disaster communication and diverse cultural groups.
我们的研究通过将 CASA 范式(Nass 等人,1994)扩展至灾害传播和多元文化群体情境,提供了重要的理论启示。
By establishing GenAI chatbots as information provision agents, this study investigates the roles of anthropomorphism through tone formality and cultural tailoring—unique dimensions of human-AI interactions in disasters. The integration of GenAI chatbots into a communicative approach to disaster management (e.g., Heath et al., 2009) offers a novel contribution to understanding how AI technologies can facilitate organization-public disaster communication, particularly in the under-investigated area of disaster preparedness. Specifically, our finding that the formal tone of GenAI chatbots increased perceived credibility but reduced friendliness in organizational disaster communication underscores the multifaceted nature of anthropomorphism in chatbot interactions during disasters. While disaster communication literature emphasizes the importance of formal, authoritative tones to enhance credibility (Zhao & Tsang, 2021), HCI research highlights the value of chatbot friendliness in fostering user engagement and satisfaction (Jin & Eastin, 2022). This points to the need for a more nuanced understanding when conceptualizing and designing the anthropomorphism of chatbots in applied communication scenarios such as disasters.
本研究通过将生成式人工智能(GenAI)聊天机器人确立为信息提供代理,探讨了拟人化在灾难中通过语气正式性和文化适配性所扮演的角色——这是人机交互在灾难情境下的独特维度。将 GenAI 聊天机器人整合至以沟通为导向的灾难管理方法(如 Heath 等人,2009),为理解人工智能技术如何促进组织与公众间的灾难沟通提供了新视角,尤其在研究不足的防灾准备领域。具体而言,我们发现 GenAI 聊天机器人的正式语气虽提升了组织灾难沟通中的可信度感知,却降低了亲切感,这一发现凸显了灾难期间聊天机器人交互中拟人化的多面性。尽管灾难传播文献强调正式权威语气对增强可信度的重要性(Zhao & Tsang, 2021),人机交互(HCI)研究则指出聊天机器人亲切感对提升用户参与度和满意度的价值(Jin & Eastin, 2022)。 这表明在灾难等应用通信场景中概念化和设计聊天机器人拟人化时,需要一种更为细致入微的理解。
Traditionally, anthropomorphism in technology involves attributing human-like characteristics to non-human entities, such as giving chatbots human names or avatars (Go & Sundar, 2019). However, the impact of cultural tailoring in improving perceived chatbot credibility suggests that users perceived our AI agents not only as social actors but also as culturally relevant ones. This implies that anthropomorphism could go beyond human-like features to include how these features resonate with users’ cultural backgrounds. Cultural tailoring represents a deeper dimension of chatbot anthropomorphism by aligning communication with specific cultural norms, values, and expectations—surpassing surface-level adjustments like using human names. This deeper form of anthropomorphism may foster stronger engagement between users and AI systems, potentially enhancing perceptions of machine intelligence and agency (Sundar, 2020). Future research could investigate whether incorporating various cultural components into chatbots strengthens perceived machine agency and how these factors influence strategic communication outcomes.
传统上,技术中拟人化的做法涉及赋予非人类实体以类人特征,例如为聊天机器人取人类名字或设计人形头像(Go & Sundar,2019 年)。然而,文化定制在提升聊天机器人可信度方面的效果表明,用户不仅将我们的 AI 代理视为社交行为者,还认为它们具有文化相关性。这意味着拟人化可以超越类人特征,涵盖这些特征如何与用户文化背景产生共鸣。文化定制通过使交流符合特定文化规范、价值观和期望——超越了使用人类名字等表层调整——代表了聊天机器人拟人化的更深层次维度。这种更深层次的拟人化形式可能促进用户与 AI 系统之间更紧密的互动,从而潜在地增强对机器智能和能动性的感知(Sundar,2020 年)。未来研究可以探讨将多种文化元素融入聊天机器人是否会强化感知到的机器能动性,以及这些因素如何影响战略传播效果。
Furthermore, our study highlights the promising use of GenAI chatbots in experimental designs. Our results support that theoretical constructs such as cultural tailoring can be validly manipulated through GenAI chatbots via prompt engineering, as the robustness of our manipulation was confirmed by both textual analysis and user self-reports. To enhance the reproducibility and transparency of our research, we have made the complete source code for developing our chatbots openly available. By providing this open access, we encourage more sophisticated experiments in AI-mediated communication, exploring contexts beyond disaster communication to better understand the broader applicability and limitations of GenAI chatbots.
此外,我们的研究凸显了生成式人工智能聊天机器人(GenAI chatbots)在实验设计中的广阔应用前景。结果表明,通过提示工程(prompt engineering)技术,诸如文化适配性等理论构念能够被 GenAI 聊天机器人有效操控——这一操控效度的稳健性已通过文本分析和用户自我报告双重验证。为提升研究的可重复性与透明度,我们已公开开发聊天机器人的完整源代码。通过开放这些资源,我们鼓励在人工智能中介传播领域开展更精细的实验探索,将研究情境拓展至灾害传播之外,以更全面地理解 GenAI 聊天机器人的适用边界与潜在局限。
Practical implications 实际意义
This study also provides practical implications for disaster management agencies. First, the findings reiterate the importance of providing culturally tailored information for diverse community members. This can be especially meaningful in cultivating or restoring trust with a specific cultural community such as the African American community, which has long exhibited distrust of government and authorities due to many sociohistorical factors (Best et al., 2021). Disaster management agencies may also consider customizing formal versus informal communication styles when interacting with diverse cultural groups based on their needs, values, and traditions. While our study focuses on the disaster preparedness phase, suggesting that a formal tone may be more effective for delivering critical information and enhancing chatbot credibility, communication in other disaster stages may benefit from a more informal, friendly communication style, such as providing social support during or after disasters.
本研究还为灾害管理机构提供了实践启示。首先,研究结果再次强调了为不同社区成员提供文化适配信息的重要性。这对于培养或重建与特定文化群体(如非裔美国人社区)的信任尤为关键,该群体由于诸多社会历史因素长期对政府及权威机构持不信任态度(Best 等人,2021)。灾害管理机构在与多元文化群体互动时,还可考虑根据其需求、价值观和传统,定制正式与非正式的沟通方式。尽管本研究聚焦于防灾准备阶段,表明正式语气可能更有利于传递关键信息并增强聊天机器人可信度,但在灾害其他阶段的沟通中,采用更非正式、友好的交流风格或许更为有益,例如在灾中或灾后提供社会支持时。
Limitations and future directions
研究局限与未来方向
This study has several limitations. First, our GPT-4 GenAI chatbot showed proficiency in disaster preparedness and could provide some local information. However, the absence of specific localized data, such as the updated nearby shelters, curtailed our ability to fine-tune the chatbot, sometimes resulting in generalized advice (e.g., urging users to visit official websites). One future direction involves augmenting GPT-4 with localized data through collaboration with agencies. We also acknowledge the challenge of aligning chatbot cultural tailoring with user perceptions for different cultural groups, which may reduce the effect size of the intervention. Future research should prioritize the creation of culturally specific prompts tailored to local communities (e.g., Haitian Americans). This approach could help bridge the gap between intended and perceived cultural alignment, ultimately enhancing disaster preparedness outcomes. In addition, our results should be generalized with caution, as they are based on a specific type of GenAI model (i.e., GPT 4). They may not be directly applicable to other generative AI models with different modalities, architectures, or training data. This highlights opportunities for future research to investigate the impact of various AI models in different communication contexts.
本研究存在若干局限性。首先,我们的 GPT-4 生成式人工智能聊天机器人虽在防灾准备方面表现出色并能提供部分本地信息,但由于缺乏特定本地化数据(如更新的附近避难所信息),限制了我们对聊天机器人进行精细调优的能力,有时导致其给出泛化建议(例如建议用户访问官方网站)。未来一个发展方向是通过与相关机构合作,利用本地化数据增强 GPT-4 的功能。我们还注意到,针对不同文化群体调整聊天机器人的文化适配性以符合用户认知存在挑战,这可能会降低干预效果的效应量。后续研究应优先开发针对特定文化群体(如海地裔美国人)的本土化提示模板,这种方法有助于弥合预期文化适配与感知文化适配之间的差距,最终提升防灾准备成效。此外,由于我们的实验结果基于特定类型的生成式 AI 模型(即 GPT-4),在推广结论时需保持谨慎,这些发现可能不适用于具有不同模态、架构或训练数据的其他生成式人工智能模型。 这凸显了未来研究探索不同 AI 模型在多样化沟通场景中影响的机遇。
Another limitation of this study is the absence of direct measures for AI-related control variables such as AI chatbot use experience and AI literacy, which may influence users’ engagement with our GenAI chatbots. To mitigate this, we included educational level as a proxy control variable; however, future research should incorporate specific AI-related measures to better account for these potential confounding factors. Lastly, while this study primarily examined how the anthropomorphism of GenAI chatbots predicts disaster preparedness outcomes, future research could further explore information processing mechanisms. For instance, investigating how tone formality and cultural tailoring trigger specific machine heuristics and influence user engagement through the HAII-TIME model (Sundar, 2020) may provide deeper insights into the trust-building process.
本研究的另一局限在于缺乏对 AI 相关控制变量(如 AI 聊天机器人使用经验和 AI 素养)的直接测量,这些因素可能影响用户对我们生成式 AI 聊天机器人的参与度。为缓解此问题,我们纳入了教育水平作为替代控制变量;然而,未来研究应包含具体的 AI 相关测量指标,以更好地解释这些潜在的混杂因素。最后,尽管本研究主要探讨了生成式 AI 聊天机器人拟人化如何预测防灾准备结果,未来研究可进一步探索信息处理机制。例如,通过 HAII-TIME 模型(Sundar, 2020)研究语气正式性和文化适配如何触发特定机器启发式并影响用户参与度,或可为信任建立过程提供更深入的见解。
Conclusion 结论
This study enriches the CASA paradigm within the context of disaster communication and vulnerability and reveals the potential of GenAI Chatbots in experimental designs. Culturally tailored communication via GPT-4 chatbots to multiethnic communities can enhance chatbot perceptions and disaster preparedness. While humanizing chatbots through an informal tone can increase their perceived friendliness, it may also undermine their credibility and the effectiveness of disaster preparedness outcomes.
本研究在灾害传播与脆弱性背景下拓展了 CASA 范式,揭示了生成式人工智能聊天机器人在实验设计中的潜力。通过 GPT-4 聊天机器人向多民族社区提供文化适配的传播内容,能够提升用户对聊天机器人的感知及灾害准备水平。然而,采用非正式语气使聊天机器人拟人化虽可增强其亲和力感知,却可能削弱其可信度及灾害准备效果。
Supplementary material 补充材料
Supplementary material is available online at Journal of Computer-Mediated Communication.
补充材料详见《计算机媒介传播杂志》在线版。
Data availability 数据可用性
Data are available upon request to the corresponding author. For access to the data or inquiries regarding this article, please contact Xinyan Zhao at ezhao@unc.edu.
数据可根据通讯作者要求提供。如需获取本文数据或有相关疑问,请联系赵欣妍,邮箱:ezhao@unc.edu。
The source code of our chatbot and user instructions are available at: https://bitbucket.org/leecwwong/ai_chatbot/.
我们的聊天机器人源代码及用户指南可在以下网址获取:https://bitbucket.org/leecwwong/ai_chatbot/。
Conflicts of interest: None declared.
利益冲突声明:无。
Note 注意
Source code and user instructions are available at: https://bitbucket.org/leecwwong/ai_chatbot/.
源代码及用户指南可在此获取:https://bitbucket.org/leecwwong/ai_chatbot/。
References 参考文献
Appleby-Arnold S., Brockdorff N., Jakovljev I., Zdravković S. (2018). 应用文化价值观促进灾害准备:来自低风险国家的经验。《国际灾害风险减少杂志》,31 卷,37–44 页。10.1016/j.ijdrr.2018.04.015
Austin L., Fisher Liu B., Jin Y. (2012). 受众如何寻求危机信息:探索社会媒介危机传播模型。《应用传播研究杂志》,40(2), 188–207. 10.1080/00909882.2012.654498
白泰洪、巴克帕耶夫、尹成勋、金成民(2022)。微笑的人工智能代理:拟人化与灿烂笑容如何增加慈善捐赠。《国际广告杂志》,41(5),850–867。10.1080/02650487.2021.2011654
巴里·L·F、艾哈迈德·I、阿哈迈德·R、齐汉·T·A、沙敏·S、普兰托·A·H、伊斯兰·M·R(2023)。人工智能在灾害风险与紧急健康管理中的潜在应用:对环境健康的批判性评估。《环境健康洞察》,17,11786302231217808。10.1177/11786302231217808
贝斯特·A·L,弗莱彻·F·E,角野·M,沃伦·R·C(2021)。非裔美国人对机构的不信任与 COVID-19 应对中建立可信度:对公共卫生伦理实践的影响。《贫困与医疗服务不足人群健康杂志》,32(1),90–98。10.1353/hpu.2021.0010
博伊德·R·L,阿肖库马尔·A,塞拉杰·S,彭尼贝克·J·W(2022)。LIWC-22 的开发与心理测量特性。德克萨斯大学奥斯汀分校。
查维斯 A. P., & 杰罗萨 M. A. (2021). 我的聊天机器人应如何互动?人机交互设计中社交特性的调查. 国际人机交互杂志, 37(8), 729–758. 10.1080/10447318.2020.1841438
陈杰、陈晨、B·沃尔瑟·J、桑达尔·S·S(2021 年 5 月)。当 AI 医生记住你时,你会感到特别吗?AI 与人类医生在用户体验上的个体化效应。《2021 年 CHI 计算系统中的人为因素会议扩展摘要》(第 1-7 页)。
段杰、张睿、迪芬德弗·J 等(2024)。GTBench:通过博弈论评估揭示LLMs的战略推理局限。arXiv 预印本,arXiv:2402.12348,未经同行评审。
迪特沃斯特·B·J、西蒙斯·J·P、梅西·C(2015)。算法厌恶:人们在看到算法出错后错误地避免使用算法。《实验心理学杂志:总论》,144(1),114-126。DOI: 10.1037/xge0000033
Flanagin A. J., Metzger M. J. (2003). 个人网页信息可信度感知受来源性别影响的研究。《人类行为中的计算机》,19(6), 683–701. 10.1016/S0747-5632(03)00021-9
Ghaffarian S., Kerle N., Pasolli E., Jokar Arsanjani J. (2019). 利用自动化深度学习更新灾后建筑数据库:灾前 OpenStreetMap 与多时相卫星数据的融合。《遥感》,11(20), 2427. 10.3390/rs11202427
Go E., Sundar S. S. (2019). 人性化聊天机器人:视觉、身份及对话线索对人性感知的影响。《人类行为中的计算机》,97, 304–316. 10.1016/j.chb.2019.01.020
戈亚尔 P., 潘迪 S., 贾因 K. (2018). 《深度学习在自然语言处理中的应用》。Apress 出版社。
格雷特里 A., 霍瓦特 C., 贝莱 N., 范里尔 A. C. (2017). “别假装是我的朋友!”社交媒体上非正式品牌沟通方式的适得其反效应。《商业研究杂志》,74 卷,77–89 页。10.1016/j.jbusres.2017.01.012
汉考克 J.T., 纳曼 M., 利维 K.(2020)。AI 中介传播:定义、研究议程与伦理考量。《计算机中介传播杂志》,25(1),89–100。10.1093/jcmc/zmz022
希思 R.L., 李 J., 倪 L.(2009)。应急管理规划与沟通的危机与风险方法:相似性与敏感性的作用。《公共关系研究杂志》,21(2),123–141。10.1080/10627260802557415
霍华德 A., 阿格利亚斯 K., 贝维斯 M., 布莱克莫尔 T. (2017). “他们会告诉我们何时撤离”:弱势群体在灾害相关沟通中的经历与期望。《国际灾害风险削减杂志》,22 卷,139–146 页。10.1016/j.ijdrr.2017.03.002
胡立台,本特勒 P.M.(1999)。协方差结构分析中拟合指标的截断标准:传统标准与新替代方案。《结构方程建模》,6(1),1–55。10.1080/10705519909540118
黄 Y., 沈 F. (2016). 文化定制对癌症传播中说服效果的影响:一项元分析. 传播学刊, 66(4), 694–715. 10.1111/jcom.12243
亨特·K.,王·B.,庄·J.(2020)。飓风哈维和艾尔玛期间通过 Twitter 进行的错误信息辟谣与跨平台信息共享:以避难所和身份核查为例的研究。《自然灾害》,103(1),861–883。DOI: 10.1007/s11069-020-04016-6
Jay, A. K., Crimmins, A. R., Avery, C. W., Dahl, T. A., Dodder, R. S., Hamlington, B. D., Lustig, A., Marvel, K., Méndez-Lazaro, P. A., Osler, M. S., Terando, A., Weeks, E. S., & Zycherman, A. (
杰伊·A. K.,克里明斯·A. R.,艾弗里·C. W.,达尔·T. A.,多德·R. S.,哈姆林顿·B. D.,勒斯蒂格·A.,马维尔·K.,门德斯-拉扎罗·P. A.,奥斯勒·M. S.,特兰多·A.,威克斯·E. S.,及泽彻曼·A.(2023)。第一章:概述——理解风险、影响与应对措施。载于克里明斯·A. R.,艾弗里·C. W.,伊斯特林·D. R.,昆克尔·K. E.,斯图尔特·B. C.,及梅科克·T. K.(编),《第五次国家气候评估》。美国全球变化研究计划,华盛顿特区,美国。DOI: 10.7930/NCA5.2023.CH1
金·E.,伊斯廷·M. S.(2022)。当聊天机器人向你微笑时:产品推荐聊天机器人友好语言使用的心理机制。《网络心理学、行为与社交网络》,25(9),597–604。DOI: 10.1089/cyber.2021.0318
Johnson D., Grayson K. (2005). 服务关系中的认知与情感信任. Journal of Business Research, 58(4), 500–507. 10.1016/S0148-2963(03)00140-1
Kelleher T. (2009). 互动在线传播中的对话声音、沟通承诺与公共关系效果。《传播学刊》,59(1),172–188. 10.1111/j.1460-2466.2008.01410.x
Kelleher T., Miller B. M. (2006). 组织博客与人声:关系策略与关系结果. 计算机媒介传播学刊, 11(2), 395–414. 10.1111/j.1083-6101.2006.00019.x
Kim J.-N., Grunig J. E., Ni L. (2010). 重新概念化公众的传播行为:问题情境中信息的获取、选择与传递。国际战略传播期刊,4(2),126–154。10.1080/15531181003701913
Koh Y. J., Sundar S. S. (2010). 在线媒体中专才与通才来源的启发式与系统化处理。《人类传播研究》,36(2), 103–124. 10.1111/j.1468-2958.2010.01370.x
Konya-Baumbach E., Biller M., von Janda S. (2023). 有人在吗?关于拟人化聊天机器人社交存在感的研究。《人类行为中的计算机》,139 卷,107513 页。10.1016/j.chb.2022.107513
Kreuter M., McClure S. (2004). 文化在健康传播中的作用。《公共卫生年度评论》,25 卷,439–455 页。10.1146/annurev.publhealth.25.101802.123000
Kreuter M., Skinner C. S., Steger-May K., Holt C. L., Bucholtz D. C., Clark E. M., Haire-Joshu D. (2004). 非裔美国女性对行为与文化定制癌症传播的响应差异。《美国健康行为杂志》,28(3),195–207 页。10.5993/ajhb.28.3.1
Leach C. W., van Zomeren M., Zebel S., Vliek M. L. W., Pennekamp S. F., Doosje B., Ouwerkerk J. W., Spears R. (2008). 群体层面的自我定义与自我投入:内群认同的层级(多成分)模型。《人格与社会心理学杂志》,95(1),144–165 页。10.1037/0022-3514.95.1.144
林建廷,梅尔加·D,托马斯·A·M,瑟西·J(2021)。利用深度学习表征地壳形变模式实现大地震早期预警。《地球物理研究杂志:固体地球》,126(10),e2021JB022703。DOI: 10.1029/2021jb022703
刘伟(2022)。多民族社区灾害传播生态:基于传播资源路径的灾害应对与社区韧性理解。《国际与跨文化传播学刊》,15(1),94–117。DOI: 10.1080/17513057.2020.1854329
刘伟,徐文文,蔡静怡(2020)。构建多层次组织-公众对话传播框架评估社交媒体灾害传播与参与效果。《公共关系评论》,46(4),101949。DOI: 10.1016/j.pubrev.2020.101949
刘伟,赵欣(2023)。传播生态如何影响多民族社区灾害支持寻求:灾害传播网络规模、异质性与本地性的作用。《大众传播与社会》,26(5),773–800。DOI: 10.1080/15205436.2022.2129390
Lombard M., 徐凯(2021)。21 世纪对媒体技术的社会响应:媒体即社会行动者范式。《人机传播》,第 2 卷,29–55 页。DOI: 10.30658/hmc.2.2
Longoni C., Bonezzi A., Morewedge C. K.(2019)。对医疗人工智能的抵制。《消费者研究杂志》,第 46 卷第 4 期,629–650 页。DOI: 10.1093/jcr/ucz013
()。,,。10.1016/j.uclim.2019.100537
,():。,(),–。10.1080/03637751.2023.2289491。
,,()。。(),–。10.1007/s11069-020-03866-4
()。。,(),–。10.1111/risa.12037
Men L. R., Zhou A., Sunny Tsai W. H. (2022). 利用聊天机器人社交对话助力组织倾听:对感知透明度与组织-公众关系的影响。《公共关系研究杂志》,34(1–2),20–44。10.1080/1062726X.2022.2068553
Men L. R., Zhou A., Jin J., Thelen P. (2023). 通过聊天机器人社交对话塑造企业性格:对组织-公众关系结果的影响。《公共关系评论》,49(5),102385。10.1016/j.pubrev.2023.102385
纳斯 C., 穆恩 Y. (2000). 机器与无意识:对计算机的社会反应. 《社会问题杂志》, 56(1), 81–103. 10.1111/0022-4537.00153
纳斯 C., 斯图尔 J., 陶伯 E. R. (1994, 四月). 计算机是社会行动者. 载于《SIGCHI 人机交互会议论文集》(pp. 72–78).
佩顿 D. (2003). 灾害准备:社会认知视角. 《灾害预防与管理:国际期刊》, 12(3), 210–216. 10.1108/09653560310480686
佩蒂格鲁 T. F. (1998). 群际接触理论. 《心理学年度评论》, 49(1), 65–85. 10.1146/annurev.psych.49.1.65
雷斯尼科·K., 巴拉诺夫斯基·T., 阿卢瓦利亚·J.S., 布雷思韦特·R.L. (1999). 公共卫生中的文化敏感性:定义与祛魅. 《种族与疾病》, 9(1), 10–21. https://www.jstor.org/stable/45410142
西格·M.W., 塞尔诺·T., 乌尔默·R.R. (2003). 沟通与组织危机. 普雷格出版社.
沙瓦尔·B.A., 阿特韦尔·E. (2007). 聊天机器人:它们真的有用吗?《语言技术与计算语言学杂志》, 22(1), 29–49.
孙达尔·S. S.(2020)。机器代理的崛起:研究人机交互(HAII)心理学的框架。《计算机媒介传播杂志》,25(1), 74-88. 10.1093/jcmc/zmz026
蔡明宏、陈建宇、康世昌(2019)。问黛安娜:基于关键词的水灾管理聊天机器人系统。《水》,11(2), 234. 10.3390/w11020234
维尔哈根·T.、范内斯·J.、费尔德伯格·F.、范多伦·W.(2014)。虚拟客服代理:利用社交存在感与个性化塑造在线服务体验。《计算机媒介传播杂志》,19(3), 529–545. 10.1111/jcc4.12066
瓦欣格·G.、伦恩·O.、贝格·C.、库利克·C.(2013)。风险认知悖论——对自然灾害治理与沟通的启示。《风险分析》,33(6), 1049–1065. 10.1111/j.1539-6924.2012.01942.x
王 C.,李 Y.,傅 W.,金 J.(2023)。是否信任聊天机器人:应用事件相关方法理解消费者在电子商务中与聊天机器人互动的情感体验。《零售与消费者服务杂志》,73 卷,103325 号。10.1016/j.jretconser.2023.103325
黄 J. B.,宋 J. H.,李 J. H.,崔 B.(2022)。与聊天机器人互动:消息类型与消费者控制。《商业研究杂志》,153 卷,309–318 页。10.1016/j.jbusres.2022.08.012
赵 X.,刘 W.(2024)。考察多民族社区灾害应对中人际沟通网络的动态。《传播专论》,91(3),351–372 页。10.1080/03637751.2023.2290681
赵 X.,曾 S. J.(2021)。通过事实核查自我保护:疫情信息寻求与验证如何影响预防行为。《应急与危机管理杂志》,30(2),171–184 页。10.1111/1468-5973.12372
赵 X., 詹 M. (2019). 触动心灵:曼彻斯特恐怖袭击期间社交媒体传播特性如何影响受众信息好感度. 《国际传播学刊》, 13, 3826–3847. https://ijoc.org/index.php/ijoc/article/view/11816
周 Q., L B., 韩 L., 周 M. (2023). 与机器人还是墙壁对话?聊天机器人对比人类代理如何影响预期沟通质量. 《人类行为中的计算机》, 143, 107674. 10.1016/j.chb.2023.107674
齐姆斯 C., 赫尔德 W., 谢赫 O., 陈 J., 张 Z., 杨 D. (2024). 大型语言模型能否变革计算社会科学?. 《计算语言学》, 50(1), 237–291. 10.1162/coli_a_00502
© 作者 2025 年。由牛津大学出版社代表国际传播协会出版。
本文是一篇开放获取文章,依据知识共享署名许可协议(https://creativecommons.org/licenses/by/4.0/)发布,允许在任何媒介中不受限制地重复使用、分发和复制,前提是正确引用原始作品。