Internship program collation

深圳虾皮 AI Chatbot 评测项目 Shenzhen Shopee AI Chatbot Evaluation Project

简历内容： AI Chatbot测试与优化：主导AI Chatbot测试流程设计，针对通用导购、单商品问答等主要场景设计测试问题集，制定并优化AI回答加权评分标准；分析不同版本模型测试得分及错误分布，与研发团队对接，反馈测试结果并商讨模型优化⽅向，显著提升相关性、可读性等核心指标准确率超5%
Resume content: AI chatbot testing and optimization: lead the design of AI chatbot testing process, design test question sets for major scenarios such as general shopping guide and single product Q&A, and formulate and optimize AI answer weighted scoring standards; Analyze the test scores and error distribution of different versions of the model, connect with the R&D team, feedback the test results and discuss the direction of model optimization, and significantly improve the accuracy of core indicators such as relevance and readability by more than 5%.

1. 现在用的数据都是seller chat的数据，上线之后用什么数据继续进行模型调试？如何获取这些数据？如何识别哪些数据是有用的？如何用这些数据进行调试？ 2. 如何获取用户的反馈并利用这些反馈？ 3. 在做这个chatbot的时候有没有看竞品的chatbot？有哪些竞品可以参考？（现在回过头去看怎么做会更好？）4. 也帮我看看现在的电商chatbot的设计，看看有没有什么启示
1. The data used now is the data of Seller Chat, what data will be used to continue model debugging after the launch? How do I get this data? How can I identify what data is useful? How can I debug with this data? 2. How do I get feedback from my users and leverage it? 3. Did you look at competing chatbots when making this chatbot? What are the competitors for reference? (Now going back and seeing how it would have been better?) ）4. Also help me look at the design of the current e-commerce chatbot to see if there is any inspiration

1. 项目背景 1. Project Background

🔹 目标 🔹 Target.

构建 AI Chatbot 评测体系，优化大模型（LLM）在电商导购场景下的表现。
Build an AI chatbot evaluation system to optimize the performance of the large model (LLM) in the e-commerce shopping guide scenario.
提升 LLM 生成质量，包括单商品问答、通用导购场景的准确性、可读性和相关性。
Improve LLM the quality of generation, including the accuracy, readability, and relevance of single-product Q&A and general shopping guide scenarios.
系统化 AI 评测流程，为大模型版本迭代提供数据支持和优化反馈。
Systematic AI evaluation process to provide data support and optimization feedback for large model version iteration.

🔹 主要挑战 🔹 Key challenges:

大模型生成错误多样化（意图理解错误、知识库召回失败、推理能力不足等）。
Diversification of large model generation errors (wrong understanding of intent, failure of knowledge base recall, lack of reasoning ability, etc.).
高效管理大规模评测数据（每周处理数百条测试数据，确保标注质量）。
Efficiently manage large-scale evaluation data (processing hundreds of test data per week to ensure quality annotation).
跨团队沟通成本高（与开发、运营、标注团队协作，推动 Bug 解决和模型优化）。
High cost of cross-team communication (collaboration with development, operations, annotation teams to drive bug resolution and model optimization).