Claude Sonnet 3.7 is the leading LLM for AI SEO: Report
Claude Sonnet 3.7 是领先的 LLM,用于 AI SEO:报告
Benchmark reveals which LLMs you can use for some SEO tasks. It also reminds us that humans are more reliable than AI (for now anyway).
基准测试揭示了哪些 LLM 可以用于一些 SEO 任务。它也提醒我们,人类(至少目前)比 AI 更可靠。
丹尼·古德温于 2025 年 4 月 29 日上午 10:50 | 阅读时间:2 分钟
Claude Sonnet 3.7 is the top-performing large language model (LLM) – it outperforms competitors like Google’s Gemini, Meta’s Llama, and X’s Grok. That’s according to SEO agency Previsible’s new AI SEO Benchmark report.
Claude Sonnet 3.7 是表现最出色的大型语言模型(LLM),它超越了谷歌的 Gemini、Meta 的 Llama 和 X 的 Grok 等竞争对手。这是根据 SEO 机构 Previsible 的新 AI SEO 基准报告得出的结论。
By the numbers. Claude Sonnet 3.7 “performed the best across the board,” earning an 83% score. But that score fell short against human SEOs (who scored 89%).
从数据来看,Claude Sonnet 3.7 “在所有方面都表现出色”,获得了 83% 的分数。但这个分数低于人类 SEO(人类 SEO 得分为 89%)。
LLMs averaged: LLMs 平均:
- 85% on content tasks. 在内容任务上为 85%。
- 79% on technical SEO. 技术 SEO 得分 79%
- 63% on ecommerce SEO. 电商 SEO 得分 63%
Here’s how the other language models scored:
其他语言模型的得分如下:
- Perplexity: 82% 复杂度:82%
- Gemini 2.5: 81% Gemini 2.5:81%
- ChatGPT 4o: 79%
- ChatGPT o3-mini: 78%
- Copilot: 78%
- Deepseek: 78%
- Gemini 2.0 Flash: 71% Gemini 2.0 闪存:71%
- Llama 4: 71% Llama 4:71%
- Grok 3: 71% Grok 3:71%
Why we care. AI is getting better at handling various routine SEO tasks (e.g., content generation, keyword mapping). However, the real value in SEO comes from human expertise: strategic planning, technical execution, cross-discipline collaboration, and creative problem-solving. Relying too heavily on LLMs could expose brands to costly SEO mistakes and search visibility.
我们为何关心。人工智能在处理各种常规 SEO 任务(例如,内容生成、关键词映射)方面变得越来越出色。然而,SEO 的真正价值在于人类的专业知识:战略规划、技术执行、跨学科协作和创造性问题解决。过度依赖 LLMs 可能会使品牌面临昂贵的 SEO 错误和搜索可见性问题。
Persona helps. One interesting finding was that adding a persona to a prompt (e.g., “you are an SEO expert”) improves performance by 2.8%, on average.
角色化助力。一个有趣的发现是,在提示中添加一个角色(例如,“你是一位 SEO 专家”)平均可以提高性能 2.8%。
What doesn’t help. Allowing LLMs to use web search resulted in 3.2% worse performance on average. Also, deep research resulted in 5.7% worse performance, on average.
什么没有帮助。允许 LLMs 使用网络搜索平均导致性能下降 3.2%。此外,深入研究平均导致性能下降 5.7%。
About the data. Previsible created a 50-question SEO test set covering key categories like content, technical SEO, and ecommerce. Each question had objectively correct answers based on established best practices and was independently scored by multiple SEO experts to ensure consistency.
关于数据。Previsible 创建了一个包含 50 个问题的 SEO 测试集,涵盖了内容、技术 SEO 和电子商务等关键类别。每个问题都有基于既定最佳实践的正确答案,并由多位 SEO 专家独立评分,以确保一致性。
The benchmark measures accuracy – so an 83% score means a model answered 83% of questions correctly. All models were tested across different modes (e.g., with and without SEO personas, web search access) to evaluate how various features impacted performance.
该基准测试衡量准确性——因此 83%的得分意味着模型正确回答了 83%的问题。所有模型都在不同的模式下进行了测试(例如,带有和没有 SEO 角色,网络搜索访问),以评估各种功能如何影响性能。
Between the lines. The core flaw of using LLMs for SEO? AI is probabilistic – it predicts, it doesn’t know.
字里行间。使用 LLMs 进行 SEO 的核心缺陷是什么?AI 是概率性的——它预测,它不知道。
- “Until [models] are 99%+ reliable, it’s impossible to rely too heavily on them. Your best bet is using them for what they’re good at – like building content briefs or identifying internal link opportunities using embeddings,” according to David Bell, Previsible SEO co-founder.
“除非[模型]达到 99%以上的可靠性,否则过度依赖它们是不可能的。根据可预见的 SEO 联合创始人 David Bell 的说法,您最好的选择是利用它们擅长的事情——比如构建内容概要或使用嵌入识别内部链接机会。”
What’s next. Previsible plans to update its AI SEO Benchmark here.
接下来是什么。预计将在此更新其 AI SEO 基准计划。
The report. Leaderboard Launch: Previsible’s New AI SEO Benchmark
报告。排行榜发布:Previsible 的新 AI SEO 基准
Related stories 相关故事
New on Search Engine Land
新在搜索引擎土地