Salary surveys worldwide regularly place software architect in the top 10 best jobs, yet no real guide exists to help developers become architects. Until now. This book provides the first comprehensive overview of software architecture’s many aspects. Aspiring and existing architects alike will examine architectural characteristics, architectural patterns, component determination, diagramming and presenting architecture, evolutionary architecture, and many other topics. 全球的薪资调查定期将软件架构师列为十大最佳职业之一,但至今没有真正的指南来帮助开发者成为架构师。直到现在。这本书提供了软件架构众多方面的首次全面概述。渴望成为架构师和现有架构师都将研究架构特征、架构模式、组件确定、图示和呈现架构、演化架构以及许多其他主题。
Mark Richards and Neal Ford-hands-on practitioners who have taught software architecture classes professionally for yearsfocus on architecture principles that apply across all technology stacks. You’ll explore software architecture in a modern light, taking into account all the innovations of the past decade. Mark Richards 和 Neal Ford - 作为多年来专业教授软件架构课程的实践者,专注于适用于所有技术栈的架构原则。您将以现代的视角探索软件架构,考虑到过去十年的所有创新。
This book examines: 本书探讨:
Architecture patterns: The technical basis for many architectural decisions 架构模式:许多架构决策的技术基础
Components: Identification, coupling, cohesion, partitioning, and granularity 组件:识别、耦合、内聚、分区和粒度
Soft skills: Effective team management, meetings, negotiation, presentations, and more 软技能:有效的团队管理、会议、谈判、演示等
Modernity: Engineering practices and operational approaches that have changed radically in the past few years 现代性:在过去几年中发生了根本性变化的工程实践和运营方法
Architecture as an engineering discipline: Repeatable results, metrics, and concrete valuations that add rigor to software architecture 作为工程学科的架构:可重复的结果、指标和具体的评估,为软件架构增添严谨性
“Whether you’re new to the role or you’ve been a practicing architect for many years, this book will help you be better at your job. I only wish they’d have written this earlier in my career.” “无论你是刚入行的新手,还是已经有多年经验的架构师,这本书都将帮助你更好地完成工作。我只希望他们能在我职业生涯早期就写这本书。”
-Nathaniel Schutta
Architect as a Service, ntschutta io
“This book will serve as a guide for many as they navigate their journey to software architecture mastery.” “这本书将作为许多人在软件架构精通之路上导航的指南。”
-Rebecca J. Parsons
CTO, ThoughtWorks
Mark Richards is an experienced software architect involved in the architecture, design, and implementation of microservices architectures, eventdriven architectures, and other distributed systems. Mark Richards 是一位经验丰富的软件架构师,参与微服务架构、事件驱动架构和其他分布式系统的架构、设计和实施。
Neal Ford is director, software architect, and meme wrangler at ThoughtWorks, a global IT consultancy with a focus on end-to-end software development and delivery. Neal was also chief technology officer at the DSW Group. Neal Ford 是 ThoughtWorks 的董事、软件架构师和表情包管理员,ThoughtWorks 是一家专注于端到端软件开发和交付的全球 IT 咨询公司。Neal 还曾担任 DSW Group 的首席技术官。
Praise for Fundamentals of Software Architecture 对《软件架构基础》的赞誉
Neal and Mark aren’t just outstanding software architects; they are also exceptional teachers. With Fundamentals of Software Architecture, they have managed to condense the sprawling topic of architecture into a concise work that reflects their decades of experience. Whether you’re new to the role or you’ve been a practicing architect for many years, this book will help you be better at your job. I only wish they’d written this earlier in my career. 尼尔和马克不仅是杰出的软件架构师;他们也是出色的教师。通过《软件架构基础》,他们成功地将庞大的架构主题浓缩成一部简明的著作,反映了他们数十年的经验。无论你是刚入行的新手,还是已经从业多年的架构师,这本书都将帮助你更好地完成工作。我只希望他们能在我职业生涯的早期就写下这本书。
-Nathaniel Schutta, Architect as a Service, ntschutta.io
Mark and Neal set out to achieve a formidable goal-to elucidate the many, layered fundamentals required to excel in software architecture-and they completed their quest. 马克和尼尔设定了一个艰巨的目标——阐明在软件架构中脱颖而出所需的多层次基础知识——并且他们完成了他们的使命。
The software architecture field continuously evolves, and the role requires a daunting breadth and depth of knowledge and skills. This book will serve as a guide for many as they navigate their journey to software architecture mastery. 软件架构领域不断发展,这个角色需要广泛而深入的知识和技能。这本书将为许多人在他们通往软件架构精通的旅程中提供指导。
-Rebecca J. Parsons, CTO, ThoughtWorks -Rebecca J. Parsons,首席技术官,ThoughtWorks
Mark and Neal truly capture real world advice for technologists to drive architecture excellence. They achieve this by identifying common architecture characteristics and the trade-offs that are necessary to drive success. Mark 和 Neal 真实地捕捉了技术人员推动架构卓越的现实建议。他们通过识别常见的架构特征和推动成功所需的权衡来实现这一点。
-Cassie Shum, Technical Director, ThoughtWorks -Cassie Shum,技术总监,ThoughtWorks
Fundamentals of Software Architecture An Engineering Approach 软件架构基础 工程方法
Mark Richards and Neal Ford Mark Richards 和 Neal Ford
Acquisitions Editor: Chris Guzikowski Proofreader: Amanda Kersey acquisitions editor: Chris Guzikowski 校对: Amanda KerseyDevelopment Editors: Alicia Young and Virginia Wilson 开发编辑:Alicia Young 和 Virginia WilsonIndexer: Ellen Troutman-Zaig 索引者:Ellen Troutman-ZaigInterior Designer: David Futato 室内设计师:David FutatoProduction Editor: Christopher Faucher Cover Designer: Karen Montgomery 制作编辑:克里斯托弗·福歇 封面设计:卡伦·蒙哥马利Copyeditor: Sonia Saruba 编辑:索尼娅·萨鲁巴Illustrator: Rebecca Demarest 插图:Rebecca Demarest
February 2020: First Edition 2020 年 2 月:第一版
Revision History for the First Edition 第一版修订历史
2020-01-27: First Release 2020-01-27:首次发布
2020-06-12: Second Release 2020-06-12: 第二版
2020-11-06: Third Release 2020-11-06:第三版
2021-02-12: Fourth Release 2021-02-12:第四版
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Fundamentals of Software Architecture, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. O'Reilly 标志是 O'Reilly Media, Inc.的注册商标。《软件架构基础》、封面图像及相关商业外观是 O'Reilly Media, Inc.的商标。
The views expressed in this work are those of the authors, and do not represent the publisher’s views. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights. 本作品中表达的观点是作者的观点,并不代表出版商的观点。虽然出版商和作者已尽善意努力确保本作品中包含的信息和说明是准确的,但出版商和作者对错误或遗漏不承担任何责任,包括但不限于因使用或依赖本作品而导致的损害责任。使用本作品中包含的信息和说明风险自负。如果本作品包含或描述的任何代码示例或其他技术受开源许可证或他人的知识产权保护,您有责任确保您的使用符合这些许可证和/或权利。
Table of Contents 目录
Preface: Invalidating Axioms … xiii 前言:无效化公理 … xiii
Introduction. … 1 引言。… 1
Defining Software Architecture … 3 定义软件架构 … 3
Expectations of an Architect … 8 建筑师的期望 … 8
Make Architecture Decisions … 9 做出架构决策 … 9
Continually Analyze the Architecture … 9 持续分析架构 … 9
Keep Current with Latest Trends … 10 保持与最新趋势同步……10
Ensure Compliance with Decisions … 10 确保遵守决策 … 10
Diverse Exposure and Experience … 11 多样的曝光和经验 … 11
Have Business Domain Knowledge … 11 拥有业务领域知识……11
Possess Interpersonal Skills … 12 具备人际交往能力 … 12
Understand and Navigate Politics … 12 理解和驾驭政治 … 12
Intersection of Architecture and… … 13 架构与…的交集 … 13
Engineering Practices … 14 工程实践 … 14
Operations/DevOps … 17 操作/DevOps … 17
Process … 18 过程 … 18
Data … 19 数据 … 19
Laws of Software Architecture … 19 软件架构法则 … 19
Part I. Foundations 第一部分。基础
Negotiation and Leadership Skills. … 347 谈判与领导技能。… 347
Negotiation and Facilitation … 347 谈判与促进 … 347
Negotiating with Business Stakeholders … 348 与业务利益相关者的谈判 … 348
Negotiating with Other Architects … 350 与其他架构师的谈判 … 350
Negotiating with Developers … 351 与开发人员的谈判 … 351
The Software Architect as a Leader … 353 软件架构师作为领导者 … 353
The 4 C’s of Architecture … 353 建筑的 4 个 C … 353
Be Pragmatic, Yet Visionary … 355 务实,但要有远见…… 355
Leading Teams by Example … 357 以身作则领导团队 … 357
Integrating with the Development Team … 360 与开发团队的集成 … 360
Summary … 363 摘要 … 363
Developing a Career Path. … 365 发展职业路径。… 365
The 20-Minute Rule … 365 20 分钟规则 … 365
Developing a Personal Radar … 367 开发个人雷达 … 367
The ThoughtWorks Technology Radar … 367 ThoughtWorks 技术雷达 … 367
Open Source Visualization Bits … 371 开源可视化位 … 371
Using Social Media … 371 使用社交媒体 … 371
Parting Words of Advice … 372 告别寄语 … 372
Appendix. Self-Assessment Questions. … 373 附录。自我评估问题。… 373
Index … 383 索引 … 383
Preface: Invalidating Axioms 前言:无效化公理
Axiom 公理
A statement or proposition which is regarded as being established, accepted, or self-evidently true. 被视为已建立、被接受或显而易见为真的陈述或命题。
Mathematicians create theories based on axioms, assumptions for things indisputably true. Software architects also build theories atop axioms, but the software world is, well, softer than mathematics: fundamental things continue to change at a rapid pace, including the axioms we base our theories upon. 数学家基于公理创造理论,公理是无可争议的真理的假设。软件架构师也在公理之上构建理论,但软件世界比数学要“软”得多:基本事物以快速的速度不断变化,包括我们所依据的公理。
The software development ecosystem exists in a constant state of dynamic equilibrium: while it exists in a balanced state at any given point in time, it exhibits dynamic behavior over the long term. A great modern example of the nature of this ecosystem follows the ascension of containerization and the attendant changes: tools like Kubernetes didn’t exist a decade ago, yet now entire software conferences exist to service its users. The software ecosystem changes chaotically: one small change causes another small change; when repeated hundreds of times, it generates a new ecosystem. 软件开发生态系统处于一种动态平衡的状态:虽然在任何给定时刻它都处于平衡状态,但从长远来看,它表现出动态行为。一个现代生态系统性质的伟大例子是容器化的兴起及其带来的变化:像 Kubernetes 这样的工具在十年前并不存在,但现在整个软件会议的存在是为了服务于其用户。软件生态系统的变化是混乱的:一个小变化会导致另一个小变化;当这种变化重复数百次时,它会生成一个新的生态系统。
Architects have an important responsibility to question assumptions and axioms left over from previous eras. Many of the books about software architecture were written in an era that only barely resembles the current world. In fact, the authors believe that we must question fundamental axioms on a regular basis, in light of improved engineering practices, operational ecosystems, software development processes-everything that makes up the messy, dynamic equilibrium where architects and developers work each day. 架构师有责任质疑从前时代遗留下来的假设和公理。许多关于软件架构的书籍是在一个与当前世界几乎没有相似之处的时代写成的。事实上,作者认为我们必须定期质疑基本公理,以适应改进的工程实践、操作生态系统、软件开发过程——所有构成架构师和开发人员每天工作的混乱而动态的平衡的因素。
Careful observers of software architecture over time witnessed an evolution of capabilities. Starting with the engineering practices of Extreme Programming, continuing with Continuous Delivery, the DevOps revolution, microservices, containerization, and now cloud-based resources, all of these innovations led to new capabilities and trade-offs. As capabilities changed, so did architects’ perspectives on the industry. For many years, the tongue-in-cheek definition of software architecture was “the stuff 仔细观察软件架构的人们见证了能力的演变。从极限编程的工程实践开始,持续交付、DevOps 革命、微服务、容器化,以及现在的云资源,所有这些创新都带来了新的能力和权衡。随着能力的变化,架构师对行业的看法也发生了变化。多年来,软件架构的调侃定义是“那些东西”。
that’s hard to change later.” Later, the microservices architecture style appeared, where change is a first-class design consideration. “这在后期很难更改。”后来,微服务架构风格出现了,在这种风格中,变更是首要的设计考虑。
Each new era requires new practices, tools, measurements, patterns, and a host of other changes. This book looks at software architecture in modern light, taking into account all the innovations from the last decade, along with some new metrics and measures suited to today’s new structures and perspectives. 每个新时代都需要新的实践、工具、测量、模式以及一系列其他变化。本书以现代的视角审视软件架构,考虑到过去十年的所有创新,以及一些适合今天新结构和视角的新指标和度量。
The subtitle of our book is “An Engineering Approach.” Developers have long wished to change software development from a craft, where skilled artisans can create oneoff works, to an engineering discipline, which implies repeatability, rigor, and effective analysis. While software engineering still lags behind other types of engineering disciplines by many orders of magnitude (to be fair, software is a very young discipline compared to most other types of engineering), architects have made huge improvements, which we’ll discuss. In particular, modern Agile engineering practices have allowed great strides in the types of systems that architects design. 我们书的副标题是“工程方法”。开发人员长期以来希望将软件开发从一种工艺转变为一种工程学科,在这种学科中,熟练的工匠可以创造一次性的作品,而工程学科则意味着可重复性、严谨性和有效分析。虽然软件工程在许多方面仍然落后于其他类型的工程学科(公平地说,软件与大多数其他类型的工程相比是一个非常年轻的学科),但架构师们已经取得了巨大的进步,我们将在后面讨论。特别是,现代敏捷工程实践使架构师设计的系统类型取得了巨大的进展。
We also address the critically important issue of trade-off analysis. As a software developer, it’s easy to become enamored with a particular technology or approach. But architects must always soberly assess the good, bad, and ugly of every choice, and virtually nothing in the real world offers convenient binary choices-everything is a trade-off. Given this pragmatic perspective, we strive to eliminate value judgments about technology and instead focus on analyzing trade-offs to equip our readers with an analytic eye toward technology choices. 我们还讨论了权衡分析这一至关重要的问题。作为软件开发人员,很容易对某种特定技术或方法产生迷恋。但架构师必须始终清醒地评估每个选择的优点、缺点和不足,而现实世界几乎没有方便的二元选择——一切都是权衡。鉴于这种务实的视角,我们努力消除对技术的价值判断,而是专注于分析权衡,以使我们的读者具备对技术选择的分析眼光。
This book won’t make someone a software architect overnight-it’s a nuanced field with many facets. We want to provide existing and burgeoning architects a good modern overview of software architecture and its many aspects, from structure to soft skills. While this book covers well-known patterns, we take a new approach, leaning on lessons learned, tools, engineering practices, and other input. We take many existing axioms in software architecture and rethink them in light of the current ecosystem, and design architectures, taking the modern landscape into account. 这本书不会让人一夜之间成为软件架构师——这是一个复杂的领域,具有许多方面。我们希望为现有和新兴的架构师提供一个关于软件架构及其许多方面的现代概述,从结构到软技能。虽然本书涵盖了众所周知的模式,但我们采取了一种新的方法,依赖于经验教训、工具、工程实践和其他输入。我们重新思考软件架构中的许多现有公理,考虑到当前生态系统,并设计架构,考虑现代环境。
Conventions Used in This Book 本书中使用的约定
The following typographical conventions are used in this book: 本书使用以下排版约定:
Italic 斜体
Indicates new terms, URLs, email addresses, filenames, and file extensions. 指示新术语、URLs、电子邮件地址、文件名和文件扩展名。
Constant width 常量宽度
Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords. 用于程序列表,以及在段落中引用程序元素,如变量或函数名称、数据库、数据类型、环境变量、语句和关键字。
Constant width bold 常量宽度粗体
Shows commands or other text that should be typed literally by the user. 显示用户应逐字输入的命令或其他文本。
Constant width italic 常量宽度斜体
Shows text that should be replaced with user-supplied values or by values determined by context. 显示应由用户提供的值或由上下文确定的值替换的文本。
This element signifies a tip or suggestion. 该元素表示一个提示或建议。
Using Code Examples 使用代码示例
Supplemental material (code examples, exercises, etc.) is available for download at http://fundamentalsofsoftwarearchitecture.com. 补充材料(代码示例、练习等)可在 http://fundamentalsofsoftwarearchitecture.com 下载。
If you have a technical question or a problem using the code examples, please send email to bookquestions@oreilly.com. 如果您有技术问题或在使用代码示例时遇到问题,请发送电子邮件至 bookquestions@oreilly.com。
This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. 本书旨在帮助您完成工作。一般来说,如果本书提供了示例代码,您可以在您的程序和文档中使用它。除非您要复制代码的重大部分,否则无需联系我们获取许可。例如,编写一个使用本书中多个代码块的程序不需要许可。销售或分发 O'Reilly 书籍中的示例需要许可。引用本书并引用示例代码来回答问题不需要许可。将本书中的大量示例代码纳入您产品的文档中需要许可。
We appreciate, but generally do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Fundamentals of Software Architecture by Mark Richards and Neal Ford (O’Reilly). Copyright 2020 Mark Richards, Neal Ford, 978-1-492-04345-4.” 我们感谢,但通常不要求署名。署名通常包括标题、作者、出版社和 ISBN。例如:“Fundamentals of Software Architecture by Mark Richards and Neal Ford (O’Reilly)。版权 2020 Mark Richards, Neal Ford, 978-1-492-04345-4。”
If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. 如果您觉得您对代码示例的使用超出了合理使用或上述许可的范围,请随时通过 permissions@oreilly.com 与我们联系。
0'Reilly Online Learning 0'Reilly 在线学习
OREILLY
For more than 40 years, O’Reilly Media has provided technology and business training, knowledge, and insight to help companies succeed. 超过 40 年来,O'Reilly Media 一直为公司提供技术和商业培训、知识和洞察,帮助他们取得成功。
Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, please visit http:// oreilly.com. 我们独特的专家和创新者网络通过书籍、文章和我们的在线学习平台分享他们的知识和专业技能。O'Reilly 的在线学习平台为您提供按需访问实时培训课程、深入学习路径、互动编码环境,以及来自 O'Reilly 和 200 多家其他出版商的大量文本和视频。欲了解更多信息,请访问 http://oreilly.com。
How to Contact Us 如何联系我们
Please address comments and questions concerning this book to the publisher: 请将有关本书的评论和问题发送给出版商:
O’Reilly Media, Inc. O'Reilly Media, Inc.
1005 Gravenstein Highway North
Sebastopol, CA 95472 塞巴斯托波尔,加州 95472
800-998-9938 (in the United States or Canada) 800-998-9938(在美国或加拿大)
707-829-0515 (international or local) 707-829-0515(国际或本地)
707-829-0104 (fax) 707-829-0104(传真)
We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/fundamentals-of-softwarearchitecture. 我们为这本书提供了一个网页,在那里我们列出了勘误、示例和任何其他信息。您可以通过 https://oreil.ly/fundamentals-of-softwarearchitecture 访问此页面。
Email bookquestions@oreilly.com to comment or ask technical questions about this book. 发送电子邮件至 bookquestions@oreilly.com 以评论或询问有关本书的技术问题。
Mark and Neal would like to thank all the people who attended our classes, workshops, conference sessions, user group meetings, as well as all the other people who listened to versions of this material and provided invaluable feedback. We would also like to thank the publishing team at O’Reilly, who made this as painless an experience as writing a book can be. We would also like to thank No Stuff Just Fluff director Jay Zimmerman for creating a conference series that allows good technical content to grow and spread, and all the other speakers whose feedback and tear-soaked shoulders we appreciate. We would also like to thank a few random oases of sanitypreserving and idea-sparking groups that have names like Pasty Geeks and the Hacker B&B. 马克和尼尔想感谢所有参加我们课程、研讨会、会议、用户组会议的人,以及所有听过该材料版本并提供宝贵反馈的其他人。我们还要感谢 O'Reilly 的出版团队,他们让写书的过程尽可能轻松。我们还要感谢 No Stuff Just Fluff 的导演杰伊·齐默尔曼,感谢他创建了一个让优秀技术内容得以成长和传播的会议系列,以及所有其他演讲者,我们感激他们的反馈和泪水浸湿的肩膀。我们还要感谢一些随机的理智保护和激发创意的小组,他们的名字像 Pasty Geeks 和 Hacker B&B。
Acknowledgments from Mark Richards 来自 Mark Richards 的致谢
In addition to the preceding acknowledgments, I would like to thank my lovely wife, Rebecca. Taking everything else on at home and sacrificing the opportunity to work on your own book allowed me to do additional consulting gigs and speak at more conferences and training classes, giving me the opportunity to practice and hone the material for this book. You are the best. 除了前面的致谢,我还想感谢我可爱的妻子,Rebecca。你在家里承担了一切,并牺牲了自己写书的机会,让我能够进行更多的咨询工作,并在更多的会议和培训班上发言,给了我练习和完善本书材料的机会。你是最棒的。
Acknowledgments from Neal Ford Neal Ford 的致谢
Neal would like to thank his extended family, ThoughtWorks as a collective, and Rebecca Parsons and Martin Fowler as individual parts of it. ThoughtWorks is an extraordinary group who manage to produce value for customers while keeping a keen eye toward why things work so that that we can improve them. ThoughtWorks supported this book in many myriad ways and continues to grow ThoughtWorkers who challenge and inspire every day. Neal would also like to thank our neighborhood cocktail club for a regular escape from routine. Lastly, Neal would like to thank his wife, Candy, whose tolerance for things like book writing and conference speaking apparently knows no bounds. For decades she’s kept me grounded and sane enough to function, and I hope she will for decades more as the love of my life. Neal 想感谢他的大家庭,作为一个整体的 ThoughtWorks,以及作为其中个体的 Rebecca Parsons 和 Martin Fowler。ThoughtWorks 是一个非凡的团队,他们在为客户创造价值的同时,始终关注事物运作的原因,以便我们能够改进它们。ThoughtWorks 以多种方式支持了这本书,并继续培养每天都在挑战和激励的 ThoughtWorkers。Neal 还想感谢我们邻里的鸡尾酒俱乐部,给他提供了一个定期逃离日常的机会。最后,Neal 想感谢他的妻子 Candy,她对书写和会议演讲等事情的包容显然是无穷无尽的。几十年来,她让我保持理智和稳定,我希望在未来的几十年里,她仍然能作为我生命中的挚爱陪伴我。
Introduction 介绍
The job “software architect” appears near the top of numerous lists of best jobs across the world. Yet when readers look at the other jobs on those lists (like nurse practitioner or finance manager), there’s a clear career path for them. Why is there no path for software architects? “软件架构师”这个职位在全球众多最佳职业榜单中名列前茅。然而,当读者查看这些榜单上的其他职位(如护士执业者或财务经理)时,明显有一条清晰的职业发展路径。为什么软件架构师没有职业发展路径呢?
First, the industry doesn’t have a good definition of software architecture itself. When we teach foundational classes, students often ask for a concise definition of what a software architect does, and we have adamantly refused to give one. And we’re not the only ones. In his famous whitepaper “Who Needs an Architect?” Martin Fowler famously refused to try to define it, instead falling back on the famous quote: 首先,行业对软件架构本身并没有一个好的定义。当我们教授基础课程时,学生们常常询问软件架构师的工作是什么,我们坚决拒绝给出一个明确的定义。而我们并不是唯一这样做的人。在他著名的白皮书《谁需要架构师?》中,马丁·福勒也拒绝尝试定义它,而是引用了那句著名的话:
Architecture is about the important stuff…whatever that is. 架构是关于重要的东西……无论那是什么。
—Ralph Johnson —拉尔夫·约翰逊
When pressed, we created the mindmap shown in Figure 1-1, which is woefully incomplete but indicative of the scope of software architecture. We will, in fact, offer our definition of software architecture shortly. 当被要求时,我们创建了图 1-1 所示的思维导图,虽然它严重不完整,但可以指示软件架构的范围。实际上,我们将很快提供我们对软件架构的定义。
Second, as illustrated in the mindmap, the role of software architect embodies a massive amount and scope of responsibility that continues to expand. A decade ago, software architects dealt only with the purely technical aspects of architecture, like modularity, components, and patterns. Since then, because of new architectural styles that leverage a wider swath of capabilities (like microservices), the role of software architect has expanded. We cover the many intersections of architecture and the remainder of the organization in “Intersection of Architecture and…” on page 13. 其次,如思维导图所示,软件架构师的角色包含了大量且广泛的责任范围,并且这一范围持续扩大。十年前,软件架构师仅处理架构的纯技术方面,如模块化、组件和模式。自那时以来,由于新的架构风格利用了更广泛的能力(如微服务),软件架构师的角色得到了扩展。我们在第 13 页的“架构与……的交集”中讨论了架构与组织其余部分的许多交集。
Figure 1-1. The responsibilities of a software architect encompass technical abilities, soft skills, operational awareness, and a host of others 图 1-1. 软件架构师的职责包括技术能力、软技能、运营意识以及其他许多方面。
Third, software architecture is a constantly moving target because of the rapidly evolving software development ecosystem. Any definition cast today will be hopelessly outdated in a few years. The Wikipedia definition of software architecture provides a reasonable overview, but many statements are outdated, such as “Software architecture is about making fundamental structural choices which are costly to change once implemented.” Yet architects designed modern architectural styles like microservices with the idea of incremental built in-it is no longer expensive to make structural changes in microservices. Of course, that capability means trade-offs with other concerns, such as coupling. Many books on software architecture treat it as a static problem; once solved, we can safely ignore it. However, we recognize the inherent dynamic nature of software architecture, including the definition itself, throughout the book. 第三,软件架构是一个不断变化的目标,因为软件开发生态系统在快速发展。今天的任何定义在几年后都会显得过时。维基百科对软件架构的定义提供了一个合理的概述,但许多说法已经过时,例如“软件架构是关于做出基本结构选择,这些选择一旦实施就很难更改。”然而,架构师设计现代架构风格,如微服务,是基于增量构建的理念——在微服务中进行结构更改不再是昂贵的。当然,这种能力意味着与其他问题(如耦合)之间的权衡。许多关于软件架构的书籍将其视为一个静态问题;一旦解决,我们就可以安全地忽略它。然而,我们在整本书中都认识到软件架构固有的动态特性,包括定义本身。
Fourth, much of the material about software architecture has only historical relevance. Readers of the Wikipedia page won’t fail to notice the bewildering array of acronyms and cross-references to an entire universe of knowledge. Yet, many of these acronyms represent outdated or failed attempts. Even solutions that were perfectly valid a few years ago cannot work now because the context has changed. The history of software architecture is littered with things architects have tried, only to realize the damaging side effects. We cover many of those lessons in this book. 第四,关于软件架构的许多材料仅具有历史相关性。维基百科页面的读者不会忽视令人困惑的缩略语和对整个知识宇宙的交叉引用。然而,这些缩略语中的许多代表了过时或失败的尝试。即使是几年前完全有效的解决方案,现在也可能无法工作,因为上下文已经改变。软件架构的历史充满了架构师尝试过的事物,最终意识到其有害的副作用。我们在本书中涵盖了许多这些教训。
Why a book on software architecture fundamentals now? The scope of software architecture isn’t the only part of the development world that constantly changes. New technologies, techniques, capabilities…in fact, it’s easier to find things that haven’t changed over the last decade than to list all the changes. Software architects must make decisions within this constantly changing ecosystem. Because everything changes, including foundations upon which we make decisions, architects should reexamine some core axioms that informed earlier writing about software architecture. For example, earlier books about software architecture don’t consider the impact of DevOps because it didn’t exist when these books were written. 为什么现在要写一本关于软件架构基础的书?软件架构的范围并不是开发世界中唯一不断变化的部分。新技术、新方法、新能力……实际上,找到过去十年中没有变化的事物比列出所有变化要容易。软件架构师必须在这个不断变化的生态系统中做出决策。因为一切都在变化,包括我们做出决策的基础,架构师应该重新审视一些早期关于软件架构写作的核心公理。例如,早期关于软件架构的书籍没有考虑 DevOps 的影响,因为在这些书写作时,DevOps 并不存在。
When studying architecture, readers must keep in mind that, like much art, it can only be understood in context. Many of the decisions architects made were based on realities of the environment they found themselves in. For example, one of the major goals of late 20th-century architecture included making the most efficient use of shared resources, because all the infrastructure at the time was expensive and commercial: operating systems, application servers, database servers, and so on. Imagine strolling into a 2002 data center and telling the head of operations “Hey, I have a great idea for a revolutionary style of architecture, where each service runs on its own isolated machinery, with its own dedicated database (describing what we now know as microservices). So, that means I’ll need 50 licenses for Windows, another 30 application server licenses, and at least 50 database server licenses.” In 2002, trying to build an architecture like microservices would be inconceivably expensive. Yet, with the advent of open source during the intervening years, coupled with updated engineering practices via the DevOps revolution, we can reasonably build an architecture as described. Readers should keep in mind that all architectures are a product of their context. 在研究架构时,读者必须记住,像许多艺术一样,它只能在上下文中理解。建筑师所做的许多决策都是基于他们所处环境的现实。例如,20 世纪末建筑的主要目标之一是尽可能高效地利用共享资源,因为当时所有基础设施都很昂贵且商业化:操作系统、应用服务器、数据库服务器等等。想象一下走进 2002 年的数据中心,告诉运营负责人:“嘿,我有一个关于革命性架构风格的好主意,每个服务都在自己隔离的机器上运行,拥有自己的专用数据库(描述我们现在所知道的微服务)。所以,这意味着我需要 50 个 Windows 许可证,另外 30 个应用服务器许可证,以及至少 50 个数据库服务器许可证。”在 2002 年,试图构建像微服务这样的架构将是不可想象的昂贵。然而,随着开源的出现,加上通过 DevOps 革命更新的工程实践,我们可以合理地构建如上所述的架构。 读者应记住,所有架构都是其背景的产物。
Defining Software Architecture 定义软件架构
The industry as a whole has struggled to precisely define “software architecture.” Some architects refer to software architecture as the blueprint of the system, while others define it as the roadmap for developing a system. The issue with these common definitions is understanding what the blueprint or roadmap actually contains. For example, what is analyzed when an architect analyzes an architecture? 整个行业一直在努力准确地定义“软件架构”。一些架构师将软件架构称为系统的蓝图,而另一些则将其定义为开发系统的路线图。这些常见定义的问题在于理解蓝图或路线图实际上包含什么。例如,当架构师分析架构时,分析的内容是什么?
Figure 1-2 illustrates a way to think about software architecture. In this definition, software architecture consists of the structure of the system (denoted as the heavy black lines supporting the architecture), combined with architecture characteristics ("-ilities") the system must support, architecture decisions, and finally design principles. 图 1-2 展示了一种思考软件架构的方法。在这个定义中,软件架构由系统的结构(用支撑架构的粗黑线表示)、系统必须支持的架构特性(“-ilities”)、架构决策以及最终的设计原则组成。
Figure 1-2. Architecture consists of the structure combined with architecture characteristics ("-ilities"), architecture decisions, and design principles 图 1-2。架构由结构与架构特性(“-ilities”)、架构决策和设计原则相结合而成。
The structure of the system, as illustrated in Figure 1-3, refers to the type of architecture style (or styles) the system is implemented in (such as microservices, layered, or microkernel). Describing an architecture solely by the structure does not wholly elucidate an architecture. For example, suppose an architect is asked to describe an architecture, and that architect responds “it’s a microservices architecture.” Here, the architect is only talking about the structure of the system, but not the architecture of the system. Knowledge of the architecture characteristics, architecture decisions, and design principles is also needed to fully understand the architecture of the system. 系统的结构,如图 1-3 所示,指的是系统所采用的架构风格(或风格类型)(例如微服务、分层或微内核)。仅通过结构来描述架构并不能完全阐明架构。例如,假设一个架构师被要求描述一个架构,而该架构师回答“这是一个微服务架构。”在这里,架构师只是在谈论系统的结构,而不是系统的架构。要全面理解系统的架构,还需要了解架构特征、架构决策和设计原则。
Figure 1-3. Structure refers to the type of architecture styles used in the system 图 1-3。结构指的是系统中使用的架构风格类型。
Architecture characteristics are another dimension of defining software architecture (see Figure 1-4). The architecture characteristics define the success criteria of a system, which is generally orthogonal to the functionality of the system. Notice that all of the characteristics listed do not require knowledge of the functionality of the system, yet they are required in order for the system to function properly. Architecture characteristics are so important that we’ve devoted several chapters in this book to understanding and defining them. 架构特性是定义软件架构的另一个维度(见图 1-4)。架构特性定义了系统的成功标准,这通常与系统的功能是正交的。请注意,列出的所有特性都不需要了解系统的功能,但它们是系统正常运行所必需的。架构特性非常重要,以至于我们在本书中专门花了几章来理解和定义它们。
Figure 1-4. Architecture characteristics refers to the “-ilities” that the system must support 图 1-4. 架构特性指的是系统必须支持的“-ilities”
The next factor that defines software architecture is architecture decisions. Architecture decisions define the rules for how a system should be constructed. For example, an architect might make an architecture decision that only the business and services layers within a layered architecture can access the database (see Figure 1-5), restricting the presentation layer from making direct database calls. Architecture decisions form the constraints of the system and direct the development teams on what is and what isn’t allowed. 定义软件架构的下一个因素是架构决策。架构决策定义了系统构建的规则。例如,架构师可能会做出一个架构决策,即只有分层架构中的业务层和服务层可以访问数据库(见图 1-5),限制表示层直接调用数据库。架构决策形成了系统的约束,并指导开发团队什么是允许的,什么是不允许的。
Figure 1-5. Architecture decisions are rules for constructing systems 图 1-5. 架构决策是构建系统的规则
If a particular architecture decision cannot be implemented in one part of the system due to some condition or other constraint, that decision (or rule) can be broken through something called a variance. Most organizations have variance models that are used by an architecture review board (ARB) or chief architect. Those models formalize the process for seeking a variance to a particular standard or architecture decision. An exception to a particular architecture decision is analyzed by the ARB (or chief architect if no ARB exists) and is either approved or denied based on justifications and trade-offs. 如果由于某些条件或其他限制,特定的架构决策无法在系统的某一部分实施,则可以通过称为变更的东西来打破该决策(或规则)。大多数组织都有变更模型,这些模型由架构评审委员会(ARB)或首席架构师使用。这些模型规范了寻求对特定标准或架构决策的变更的过程。对特定架构决策的例外由 ARB(如果没有 ARB,则由首席架构师)进行分析,并根据理由和权衡进行批准或拒绝。
The last factor in the definition of architecture is design principles. A design principle differs from an architecture decision in that a design principle is a guideline rather than a hard-and-fast rule. For example, the design principle illustrated in Figure 1-6 states that the development teams should leverage asynchronous messaging between services within a microservices architecture to increase performance. An architecture decision (rule) could never cover every condition and option for communication between services, so a design principle can be used to provide guidance for the preferred method (in this case, asynchronous messaging) to allow the developer to choose a more appropriate communication protocol (such as REST or gRPC) given a specific circumstance. 架构定义中的最后一个因素是设计原则。设计原则与架构决策不同,设计原则是一种指导方针,而不是严格的规则。例如,图 1-6 中所示的设计原则指出,开发团队应在微服务架构中利用服务之间的异步消息传递来提高性能。架构决策(规则)无法涵盖服务之间通信的每种条件和选项,因此可以使用设计原则为首选方法(在这种情况下为异步消息传递)提供指导,以便开发人员在特定情况下选择更合适的通信协议(例如 REST 或 gRPC)。
Figure 1-6. Design principles are guidelines for constructing systems 图 1-6. 设计原则是构建系统的指导方针
Expectations of an Architect 建筑师的期望
Defining the role of a software architect presents as much difficulty as defining software architecture. It can range from expert programmer up to defining the strategic technical direction for the company. Rather than waste time on the fool’s errand of defining the role, we recommend focusing on the expectations of an architect. 定义软件架构师的角色与定义软件架构一样困难。它可以从专家程序员到为公司定义战略技术方向不等。与其浪费时间在定义角色的无用功上,我们建议关注架构师的期望。
There are eight core expectations placed on a software architect, irrespective of any given role, title, or job description: 软件架构师有八个核心期望,无论任何特定角色、头衔或职位描述:
Make architecture decisions 做出架构决策
Continually analyze the architecture 持续分析架构
Keep current with latest trends 保持对最新趋势的关注
Ensure compliance with decisions 确保遵守决策
Diverse exposure and experience 多样的接触和经验
Have business domain knowledge 拥有业务领域知识
Possess interpersonal skills 具备人际交往能力
Understand and navigate politics 理解和应对政治
The first key to effectiveness and success in the software architect role depends on understanding and practicing each of these expectations. 在软件架构师角色中,效果和成功的第一个关键在于理解和实践这些期望中的每一个。
Make Architecture Decisions 做出架构决策
An architect is expected to define the architecture decisions and design principles used to guide technology decisions within the team, the department, or across the enterprise. 架构师需要定义用于指导团队、部门或整个企业技术决策的架构决策和设计原则。
Guide is the key operative word in this first expectation. An architect should guide rather than specify technology choices. For example, an architect might make a decision to use React.js for frontend development. In this case, the architect is making a technical decision rather than an architectural decision or design principle that will help the development team make choices. An architect should instead instruct development teams to use a reactive-based framework for frontend web development, hence guiding the development team in making the choice between Angular, Elm, React.js, Vue, or any of the other reactive-based web frameworks. “指导”是这个第一期望中的关键操作词。架构师应该指导而不是指定技术选择。例如,架构师可能决定使用 React.js 进行前端开发。在这种情况下,架构师是在做一个技术决策,而不是一个架构决策或设计原则,这将帮助开发团队做出选择。架构师应该指示开发团队使用基于响应式的框架进行前端网页开发,从而指导开发团队在 Angular、Elm、React.js、Vue 或其他任何基于响应式的网页框架之间做出选择。
Guiding technology choices through architecture decisions and design principles is difficult. The key to making effective architectural decisions is asking whether the architecture decision is helping to guide teams in making the right technical choice or whether the architecture decision makes the technical choice for them. That said, an architect on occasion might need to make specific technology decisions in order to preserve a particular architectural characteristic such as scalability, performance, or availability. In this case it would be still considered an architectural decision, even though it specifies a particular technology. Architects often struggle with finding the correct line, so Chapter 19 is entirely about architecture decisions. 通过架构决策和设计原则指导技术选择是困难的。做出有效架构决策的关键在于询问架构决策是否有助于指导团队做出正确的技术选择,或者架构决策是否为他们做出了技术选择。也就是说,架构师有时可能需要做出特定的技术决策,以保持特定的架构特性,例如可扩展性、性能或可用性。在这种情况下,这仍然被视为架构决策,即使它指定了特定的技术。架构师常常在找到正确的界限上挣扎,因此第 19 章完全关于架构决策。
Continually Analyze the Architecture 持续分析架构
An architect is expected to continually analyze the architecture and current technology environment and then recommend solutions for improvement. 架构师需要不断分析架构和当前的技术环境,然后推荐改进方案。
This expectation of an architect refers to architecture vitality, which assesses how viable the architecture that was defined three or more years ago is today, given changes in both business and technology. In our experience, not enough architects focus their energies on continually analyzing existing architectures. As a result, most architectures experience elements of structural decay, which occurs when developers make coding or design changes that impact the required architectural characteristics, such as performance, availability, and scalability. 这种对架构师的期望指的是架构的活力,它评估三年或更早之前定义的架构在今天的可行性,考虑到业务和技术的变化。根据我们的经验,许多架构师并没有将精力集中在持续分析现有架构上。因此,大多数架构都会经历结构衰退的元素,这发生在开发人员进行编码或设计更改时,这些更改影响了所需的架构特性,如性能、可用性和可扩展性。
Other forgotten aspects of this expectation that architects frequently forget are testing and release environments. Agility for code modification has obvious benefits, but if it takes teams weeks to test changes and months for releases, then architects cannot achieve agility in the overall architecture. 建筑师经常忽视的这个期望的其他被遗忘的方面是测试和发布环境。代码修改的灵活性显然有好处,但如果团队需要几周来测试更改,几个月来发布,那么建筑师就无法在整体架构中实现灵活性。
An architect must holistically analyze changes in technology and problem domains to determine the soundness of the architecture. While this kind of consideration rarely appears in a job posting, architects must meet this expectation to keep applications relevant. 架构师必须全面分析技术和问题领域的变化,以确定架构的合理性。虽然这种考虑在职位发布中很少出现,但架构师必须满足这一期望,以保持应用程序的相关性。
Keep Current with Latest Trends 保持对最新趋势的关注
An architect is expected to keep current with the latest technology and industry trends. 建筑师应保持对最新技术和行业趋势的了解。
Developers must keep up to date on the latest technologies they use on a daily basis to remain relevant (and to retain a job!). An architect has an even more critical requirement to keep current on the latest technical and industry trends. The decisions an architect makes tend to be long-lasting and difficult to change. Understanding and following key trends helps the architect prepare for the future and make the correct decision. 开发人员必须及时了解他们每天使用的最新技术,以保持相关性(并保住工作!)。架构师有一个更为关键的要求,即保持对最新技术和行业趋势的了解。架构师所做的决策往往是持久的且难以更改。理解和跟随关键趋势有助于架构师为未来做好准备并做出正确的决策。
Tracking trends and keeping current with those trends is hard, particularly for a software architect. In Chapter 24 we discuss various techniques and resources on how to do this. 跟踪趋势并保持与这些趋势的同步是困难的,特别是对于软件架构师。在第 24 章中,我们讨论了各种技术和资源来实现这一点。
Ensure Compliance with Decisions 确保遵守决策
An architect is expected to ensure compliance with architecture decisions and design principles. 架构师需要确保遵循架构决策和设计原则。
Ensuring compliance means that the architect is continually verifying that development teams are following the architecture decisions and design principles defined, documented, and communicated by the architect. Consider the scenario where an architect makes a decision to restrict access to the database in a layered architecture to only the business and services layers (and not the presentation layer). This means that the presentation layer must go through all layers of the architecture to make even the simplest of database calls. A user interface developer might disagree with this decision and access the database (or the persistence layer) directly for performance reasons. However, the architect made that architecture decision for a specific reason: to control change. By closing the layers, database changes can be made without impacting the presentation layer. By not ensuring compliance with architecture decisions, violations like this can occur, the architecture will not meet the required architectural characteristics ("-ilities"), and the application or system will not work as expected. 确保合规意味着架构师不断验证开发团队是否遵循架构师定义、记录和传达的架构决策和设计原则。考虑一个场景,架构师决定在分层架构中限制对数据库的访问,仅允许业务层和服务层访问(而不允许表示层访问)。这意味着表示层必须通过架构的所有层才能进行即使是最简单的数据库调用。用户界面开发人员可能不同意这个决定,并出于性能原因直接访问数据库(或持久层)。然而,架构师做出这个架构决策是出于特定原因:控制变更。通过关闭层,数据库更改可以在不影响表示层的情况下进行。如果不确保遵循架构决策,就可能发生这样的违规行为,架构将无法满足所需的架构特性(“-ilities”),应用程序或系统将无法按预期工作。
In Chapter 6 we talk more about measuring compliance using automated fitness functions and automated tools. 在第六章中,我们将更多地讨论使用自动化适应性函数和自动化工具来衡量合规性。
Diverse Exposure and Experience 多样的曝光和经验
An architect is expected to have exposure to multiple and diverse technologies, frameworks, platforms, and environments. 架构师应当接触多种多样的技术、框架、平台和环境。
This expectation does not mean an architect must be an expert in every framework, platform, and language, but rather that an architect must at least be familiar with a variety of technologies. Most environments these days are heterogeneous, and at a minimum an architect should know how to interface with multiple systems and services, irrespective of the language, platform, and technology those systems or services are written in. 这种期望并不意味着架构师必须精通每个框架、平台和语言,而是架构师至少应该熟悉多种技术。如今大多数环境都是异构的,架构师至少应该知道如何与多个系统和服务进行接口,无论这些系统或服务是用什么语言、平台和技术编写的。
One of the best ways of mastering this expectation is for the architect to stretch their comfort zone. Focusing only on a single technology or platform is a safe haven. An effective software architect should be aggressive in seeking out opportunities to gain experience in multiple languages, platforms, and technologies. A good way of mastering this expectation is to focus on technical breadth rather than technical depth. Technical breadth includes the stuff you know about, but not at a detailed level, combined with the stuff you know a lot about. For example, it is far more valuable for an architect to be familiar with 10 different caching products and the associated pros and cons of each rather than to be an expert in only one of them. 掌握这种期望的最佳方法之一是让架构师拓展他们的舒适区。仅专注于单一技术或平台是一种安全的避风港。有效的软件架构师应该积极寻求在多种语言、平台和技术中获得经验的机会。掌握这种期望的一个好方法是关注技术广度而不是技术深度。技术广度包括你了解的内容,但不是详细级别的内容,结合你非常了解的内容。例如,架构师熟悉 10 种不同的缓存产品及其各自的优缺点,远比仅精通其中一种产品更有价值。
Have Business Domain Knowledge 拥有业务领域知识
An architect is expected to have a certain level of business domain expertise. 架构师应该具备一定程度的业务领域专业知识。
Effective software architects understand not only technology but also the business domain of a problem space. Without business domain knowledge, it is difficult to understand the business problem, goals, and requirements, making it difficult to design an effective architecture to meet the requirements of the business. Imagine being an architect at a large financial institution and not understanding common financial terms such as an average directional index, aleatory contracts, rates rally, or even nonpriority debt. Without this knowledge, an architect cannot communicate with stakeholders and business users and will quickly lose credibility. 有效的软件架构师不仅要理解技术,还要理解问题领域的业务领域。如果没有业务领域知识,就很难理解业务问题、目标和需求,从而难以设计出有效的架构来满足业务的需求。想象一下,作为一家大型金融机构的架构师,却不理解诸如平均方向指数、偶然合同、利率反弹或甚至非优先债务等常见金融术语。没有这些知识,架构师无法与利益相关者和业务用户进行沟通,并且很快会失去可信度。
The most successful architects we know are those who have broad, hands-on technical knowledge coupled with a strong knowledge of a particular domain. These software architects are able to effectively communicate with C-level executives and business users using the domain knowledge and language that these stakeholders know and understand. This in turn creates a strong level of confidence that the software architect knows what they are doing and is competent to create an effective and correct architecture. 我们所知道的最成功的架构师是那些拥有广泛的实践技术知识以及对特定领域有深入了解的人。这些软件架构师能够有效地与 C 级高管和业务用户沟通,使用这些利益相关者所熟悉和理解的领域知识和语言。这反过来又增强了对软件架构师的信心,认为他们知道自己在做什么,并有能力创建有效且正确的架构。
Possess Interpersonal Skills 具备人际交往能力
An architect is expected to possess exceptional interpersonal skills, including teamwork, facilitation, and leadership. 建筑师应具备卓越的人际交往能力,包括团队合作、促进和领导能力。
Having exceptional leadership and interpersonal skills is a difficult expectation for most developers and architects. As technologists, developers and architects like to solve technical problems, not people problems. However, as Gerald Weinberg was famous for saying, “no matter what they tell you, it’s always a people problem.” An architect is not only expected to provide technical guidance to the team, but is also expected to lead the development teams through the implementation of the architecture. Leadership skills are at least half of what it takes to become an effective software architect, regardless of the role or title the architect has. 拥有卓越的领导能力和人际交往能力对大多数开发人员和架构师来说是一个困难的期望。作为技术人员,开发人员和架构师喜欢解决技术问题,而不是人际问题。然而,正如杰拉尔德·温伯格所说的,“无论他们告诉你什么,这始终是一个人际问题。”架构师不仅被期望为团队提供技术指导,还被期望在架构实施过程中领导开发团队。领导能力至少占据了成为有效软件架构师所需技能的一半,无论架构师的角色或头衔是什么。
The industry is flooded with software architects, all competing for a limited number of architecture positions. Having strong leadership and interpersonal skills is a good way for an architect to differentiate themselves from other architects and stand out from the crowd. We’ve known many software architects who are excellent technologists but are ineffective architects due to the inability to lead teams, coach and mentor developers, and effectively communicate ideas and architecture decisions and principles. Needless to say, those architects have difficulties holding a position or job. 行业中充斥着软件架构师,所有人都在争夺有限的架构职位。拥有强大的领导能力和人际交往技巧是架构师与其他架构师区分开来并脱颖而出的好方法。我们认识许多优秀的技术专家,但由于无法领导团队、指导和辅导开发人员,以及有效沟通想法和架构决策与原则,他们成为了无效的架构师。无需多说,这些架构师在保持职位或工作方面面临困难。
Understand and Navigate Politics 理解和驾驭政治
An architect is expected to understand the political climate of the enterprise and be able to navigate the politics. 架构师需要理解企业的政治气候,并能够应对政治。
It might seem rather strange talk about negotiation and navigating office politics in a book about software architecture. To illustrate how important and necessary negotiation skills are, consider the scenario where a developer makes the decision to leverage the strategy pattern to reduce the overall cyclomatic complexity of a particular piece of complex code. Who really cares? One might applaud the developer for using such a pattern, but in almost all cases the developer does not need to seek approval for such a decision. 在一本关于软件架构的书中谈论谈判和处理办公室政治似乎有些奇怪。为了说明谈判技巧的重要性和必要性,考虑这样一个场景:一位开发者决定利用策略模式来降低一段复杂代码的整体圈复杂度。谁真的在乎呢?人们可能会赞扬开发者使用这样的模式,但在几乎所有情况下,开发者并不需要为这样的决定寻求批准。
Now consider the scenario where an architect, responsible for a large customer relationship management system, is having issues controlling database access from other systems, securing certain customer data, and making any database schema change because too many other systems are using the CRM database. The architect therefore makes the decision to create what are called application silos, where each application database is only accessible from the application owning that database. Making this decision will give the architect better control over the customer data, security, and change control. However, unlike the previous developer scenario, this decision will also be challenged by almost everyone in the company (with the possible exception of the CRM application team, of course). Other applications need the customer manage- 现在考虑一个场景,一个负责大型客户关系管理系统的架构师,正在面临控制来自其他系统的数据库访问、保护某些客户数据以及进行任何数据库模式更改的问题,因为太多其他系统正在使用 CRM 数据库。因此,架构师决定创建所谓的应用程序孤岛,每个应用程序数据库仅可由拥有该数据库的应用程序访问。做出这个决定将使架构师更好地控制客户数据、安全性和变更控制。然而,与之前的开发者场景不同,这个决定几乎会受到公司中几乎所有人的挑战(当然,CRM 应用程序团队可能是个例外)。其他应用程序需要客户管理-
ment data. If those applications are no longer able to access the database directly, they must now ask the CRM system for the data, requiring remote access calls through REST, SOAP, or some other remote access protocol. 如果这些应用程序不再能够直接访问数据库,它们现在必须向 CRM 系统请求数据,这需要通过 REST、SOAP 或其他远程访问协议进行远程访问调用。
The main point is that almost every decision an architect makes will be challenged. Architectural decisions will be challenged by product owners, project managers, and business stakeholders due to increased costs or increased effort (time) involved. Architectural decisions will also be challenged by developers who feel their approach is better. In either case, the architect must navigate the politics of the company and apply basic negotiation skills to get most decisions approved. This fact can be very frustrating to a software architect, because most decisions made as a developer did not require approval or even a review. Programming aspects such as code structure, class design, design pattern selection, and sometimes even language choice are all part of the art of programming. However, an architect, now able to finally be able to make broad and important decisions, must justify and fight for almost every one of those decisions. Negotiation skills, like leadership skills, are so critical and necessary that we’ve dedicated an entire chapter in the book to understanding them (see Chapter 23). 主要观点是,几乎每个架构师所做的决策都会受到挑战。由于涉及的成本增加或时间增加,产品负责人、项目经理和业务利益相关者都会对架构决策提出质疑。开发人员也会对架构决策提出挑战,因为他们认为自己的方法更好。在这两种情况下,架构师必须在公司内部处理政治问题,并运用基本的谈判技巧来获得大多数决策的批准。这一事实对软件架构师来说可能非常令人沮丧,因为作为开发人员所做的大多数决策并不需要批准甚至审查。编程方面的内容,如代码结构、类设计、设计模式选择,有时甚至是语言选择,都是编程艺术的一部分。然而,架构师现在终于能够做出广泛而重要的决策,必须为几乎每一个决策辩护并争取支持。谈判技巧和领导技能一样至关重要,以至于我们在书中专门 dedicat ed 了整整一章来理解它们(见第 23 章)。
Intersection of Architecture and... 架构与...的交集
The scope of software architecture has grown over the last decade to encompass more and more responsibility and perspective. A decade ago, the typical relationship between architecture and operations was contractual and formal, with lots of bureaucracy. Most companies, trying to avoid the complexity of hosting their own operations, frequently outsourced operations to a third-party company, with contractual obligations for service-level agreements, such as uptime, scale, responsiveness, and a host of other important architectural characteristics. Now, architectures such as microservices freely leverage former solely operational concerns. For example, elastic scale was once painfully built into architectures (see Chapter 15), while microservices handled it less painfully via a liaison between architects and DevOps. 软件架构的范围在过去十年中不断扩大,涵盖了越来越多的责任和视角。十年前,架构与运营之间的典型关系是合同式和正式的,充满了官僚主义。大多数公司为了避免自己运营的复杂性,常常将运营外包给第三方公司,并承担服务水平协议的合同义务,例如正常运行时间、规模、响应能力以及其他许多重要的架构特征。现在,像微服务这样的架构自由地利用以前仅仅是运营关注的问题。例如,弹性扩展曾经痛苦地融入架构中(见第 15 章),而微服务则通过架构师与 DevOps 之间的联络以较少的痛苦处理了这一问题。
History: Pets.com and Why We Have Elastic Scale 历史:Pets.com 及我们为何需要弹性扩展
The history of software development contains rich lessons, both good and bad. We assume that current capabilities (like elastic scale) just appeared one day because of some clever developer, but those ideas were often born of hard lessons. Pets.com represents an early example of hard lessons learned. Pets.com appeared in the early days of the internet, hoping to become the Amazon.com of pet supplies. Fortunately, they had a brilliant marketing department, which invented a compelling mascot: a sock puppet with a microphone that said irreverent things. The mascot became a superstar, appearing in public at parades and national sporting events. 软件开发的历史包含了丰富的教训,既有好的也有坏的。我们认为当前的能力(如弹性扩展)是某个聪明的开发者某天突然出现的,但这些想法往往是经过艰难的教训而产生的。Pets.com 就是一个早期的艰难教训的例子。Pets.com 出现在互联网的早期,试图成为宠物用品的 Amazon.com。幸运的是,他们拥有一个出色的市场营销部门,创造了一个引人注目的吉祥物:一个带麦克风的袜子玩偶,讲一些不敬的话。这个吉祥物成为了超级明星,在游行和全国体育赛事中公开亮相。
Unfortunately, management at Pets.com apparently spent all the money on the mascot, not on infrastructure. Once orders started pouring in, they weren’t prepared. The website was slow, transactions were lost, deliveries delayed, and so on…pretty much the worst-case scenario. So bad, in fact, that the business closed shortly after its disastrous Christmas rush, selling the only remaining valuable asset (the mascot) to a competitor. 不幸的是,Pets.com 的管理层显然把所有的钱都花在了吉祥物上,而不是基础设施上。一旦订单开始涌入,他们就没有准备好。网站运行缓慢,交易丢失,交付延迟,等等……几乎是最糟糕的情况。实际上糟糕到在灾难性的圣诞购物潮后不久就关闭了业务,将唯一剩下的有价值资产(吉祥物)卖给了竞争对手。
What the company needed was elastic scale: the ability to spin up more instances of resources, as needed. Cloud providers offer this feature as a commodity, but in the early days of the internet, companies had to manage their own infrastructure, and many fell victim to a previously unheard of phenomenon: too much success can kill the business. Pets.com and other similar horror stories led engineers to develop the frameworks that architects enjoy now. 公司需要的是弹性扩展:根据需要启动更多资源实例的能力。云服务提供商将此功能作为商品提供,但在互联网的早期,企业必须管理自己的基础设施,许多公司成为了一个前所未有现象的受害者:过多的成功可能会扼杀业务。Pets.com 和其他类似的恐怖故事促使工程师们开发了现在架构师所享用的框架。
The following sections delve into some of the newer intersections between the role of architect and other parts of an organization, highlighting new capabilities and responsibilities for architects. 以下部分深入探讨了建筑师角色与组织其他部分之间的一些新交集,突出了建筑师的新能力和责任。
Engineering Practices 工程实践
Traditionally, software architecture was separate from the development process used to create software. Dozens of popular methodologies exist to build software, including Waterfall and many flavors of Agile (such as Scrum, Extreme Programming, Lean, and Crystal), which mostly don’t impact software architecture. 传统上,软件架构与用于创建软件的开发过程是分开的。存在数十种流行的方法论来构建软件,包括瀑布模型和多种敏捷方法(如 Scrum、极限编程、精益和 Crystal),这些方法大多不会影响软件架构。
However, over the last few years, engineering advances have thrust process concerns upon software architecture. It is useful to separate software development process from engineering practices. By process, we mean how teams are formed and managed, how meetings are conducted, and workflow organization; it refers to the mechanics of how people organize and interact. Software engineering practices, on the other hand, refer to process-agnostic practices that have illustrated, repeatable benefit. For example, continuous integration is a proven engineering practice that doesn’t rely on a particular process. 然而,在过去的几年里,工程进步将过程问题推向了软件架构。将软件开发过程与工程实践分开是有益的。我们所说的过程是指团队的组建和管理、会议的进行以及工作流程的组织;它涉及人们如何组织和互动的机制。另一方面,软件工程实践是指那些与过程无关的实践,这些实践已经证明具有可重复的好处。例如,持续集成是一种经过验证的工程实践,它不依赖于特定的过程。
The Path from Extreme Programming to Continuous Delivery 从极限编程到持续交付的路径
The origins of Extreme Programming (XP) nicely illustrate the difference between process and engineering. In the early 1990s, a group of experienced software developers, led by Kent Beck, started questioning the dozens of different development processes popular at the time. In their experience, it seemed that none of them created repeatably good outcomes. One of the XP founders said that choosing one of the extant processes was “no more guarantee of project success than flipping a coin.” They decided to rethink how to build software, and they started the XP project in March of 1996. To inform their process, they rejected the conventional wisdom and 极限编程(XP)的起源很好地说明了过程与工程之间的区别。在 1990 年代初,一群经验丰富的软件开发人员在 Kent Beck 的带领下,开始质疑当时流行的几十种不同开发过程。在他们的经验中,似乎没有一种能够重复产生良好的结果。XP 的创始人之一表示,选择现有的某个过程“对项目成功的保证不比抛硬币更可靠。”他们决定重新思考如何构建软件,并于 1996 年 3 月启动了 XP 项目。为了完善他们的过程,他们拒绝了传统智慧并
focused on the practices that led to project success in the past, pushed to the extreme. Their reasoning was that they’d seen a correlation on previous projects between more tests and higher quality. Thus, the XP approach to testing took the practice to the extreme: do test-first development, ensuring that all code is tested before it enters the code base. 专注于过去导致项目成功的实践,推向极限。他们的推理是,他们在之前的项目中看到更多的测试与更高的质量之间存在相关性。因此,XP 的测试方法将这一实践推向了极限:进行测试优先开发,确保所有代码在进入代码库之前都经过测试。
XP was lumped into other popular Agile processes that shared similar perspectives, but it was one of the few methodologies that included engineering practices such as automation, testing, continuous integration, and other concrete, experienced-based techniques. The efforts to continue advancing the engineering side of software development continued with the book Continuous Delivery (Addison-Wesley Profes-sional)-an updated version of many XP practices-and came to fruition in the DevOps movement. In many ways, the DevOps revolution occurred when operations adopted engineering practices originally espoused by XP: automation, testing, declarative single source of truth, and others. XP 被归入其他流行的敏捷流程中,这些流程有着相似的观点,但它是为数不多的包含工程实践的方法论之一,如自动化、测试、持续集成以及其他具体的、基于经验的技术。继续推动软件开发工程方面的努力体现在书籍《持续交付》(Addison-Wesley Professional)中——这是许多 XP 实践的更新版本——并在 DevOps 运动中取得了成果。在许多方面,DevOps 革命发生在运营部门采纳了最初由 XP 倡导的工程实践:自动化、测试、声明式单一真实来源等。
We strongly support these advances, which form the incremental steps that will eventually graduate software development into a proper engineering discipline. 我们强烈支持这些进展,这些进展构成了逐步推进的软件开发,最终将其提升为一个真正的工程学科。
Focusing on engineering practices is important. First, software development lacks many of the features of more mature engineering disciplines. For example, civil engineers can predict structural change with much more accuracy than similarly important aspects of software structure. Second, one of the Achilles heels of software development is estimation-how much time, how many resources, how much money? Part of this difficulty lies with antiquated accounting practices that cannot accommodate the exploratory nature of software development, but another part is because we’re traditionally bad at estimation, at least in part because of unknown unknowns. 关注工程实践是重要的。首先,软件开发缺乏许多更成熟的工程学科的特征。例如,土木工程师可以比软件结构的同样重要方面更准确地预测结构变化。其次,软件开发的一个致命弱点是估算——需要多少时间,多少资源,多少资金?这种困难部分源于无法适应软件开发探索性特征的过时会计实践,但另一部分是因为我们在估算方面传统上表现不佳,至少部分原因是由于未知的未知。
…because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns-the ones we don’t know we don’t know. …因为我们知道,有已知的已知;我们知道我们知道的事情。我们也知道有已知的未知;也就是说,我们知道有一些事情我们不知道。但也有未知的未知——那些我们不知道我们不知道的事情。
-Former United States Secretary of Defense Donald Rumsfeld -前美国国防部长唐纳德·拉姆斯菲尔德
Unknown unknowns are the nemesis of software systems. Many projects start with a list of known unknowns: things developers must learn about the domain and technology they know are upcoming. However, projects also fall victim to unknown unknowns: things no one knew were going to crop up yet have appeared unexpectedly. This is why all “Big Design Up Front” software efforts suffer: architects cannot design for unknown unknowns. To quote Mark (one of your authors): 未知的未知是软件系统的克星。许多项目以已知未知的清单开始:开发人员必须了解他们知道即将出现的领域和技术。然而,项目也会成为未知未知的受害者:没有人知道会出现的事情却意外地出现了。这就是为什么所有“前期大设计”的软件工作都会遭受困扰:架构师无法为未知的未知进行设计。引用马克(你们的作者之一)的话:
All architectures become iterative because of unknown unknowns, Agile just recognizes this and does it sooner. 所有架构都变得迭代,因为存在未知的未知,敏捷方法只是承认这一点并更早地进行迭代。
Thus, while process is mostly separate from architecture, an iterative process fits the nature of software architecture better. Teams trying to build a modern system such as microservices using an antiquated process like Waterfall will find a great deal of friction from an antiquated process that ignores the reality of how software comes together. 因此,尽管过程与架构大多是分开的,但迭代过程更符合软件架构的本质。试图使用像瀑布这样的过时过程构建现代系统(如微服务)的团队,将会发现来自这种忽视软件如何结合的过时过程的巨大摩擦。
Often, the architect is also the technical leader on projects and therefore determines the engineering practices the team uses. Just as architects must carefully consider the problem domain before choosing an architecture, they must also ensure that the architectural style and engineering practices form a symbiotic mesh. For example, a microservices architecture assumes automated machine provisioning, automated testing and deployment, and a raft of other assumptions. Trying to build one of these architectures with an antiquated operations group, manual processes, and little testing creates tremendous friction and challenges to success. Just as different problem domains lend themselves toward certain architectural styles, engineering practices have the same kind of symbiotic relationship. 通常,架构师也是项目的技术领导,因此决定团队使用的工程实践。正如架构师在选择架构之前必须仔细考虑问题领域,他们还必须确保架构风格和工程实践形成一种共生的网络。例如,微服务架构假设自动化的机器配置、自动化测试和部署,以及一系列其他假设。试图在一个过时的运营团队、手动流程和很少测试的情况下构建这些架构,会造成巨大的摩擦和成功的挑战。正如不同的问题领域倾向于某些架构风格,工程实践也有同样的共生关系。
The evolution of thought leading from Extreme Programming to Continuous Delivery continues. Recent advances in engineering practices allow new capabilities within architecture. Neal’s most recent book, Building Evolutionary Architectures (O’Reilly), highlights new ways to think about the intersection of engineering practices and architecture, allowing better automation of architectural governance. While we won’t summarize that book here, it gives an important new nomenclature and way of thinking about architectural characteristics that will infuse much of the remainder of this book. Neal’s book covers techniques for building architectures that change gracefully over time. In Chapter 4, we describe architecture as the combination of requirements and additional concerns, as illustrated in Figure 1-7. 从极限编程到持续交付的思想演变仍在继续。工程实践的最新进展使架构中具备了新的能力。Neal 最近的书《Building Evolutionary Architectures》(O'Reilly)强调了关于工程实践与架构交集的新思维方式,从而更好地实现架构治理的自动化。虽然我们在这里不会总结这本书,但它提供了一种重要的新命名法和思维方式,关于架构特征,这将贯穿本书的其余部分。Neal 的书涵盖了构建随时间优雅变化的架构的技术。在第 4 章中,我们将架构描述为需求和额外关注点的结合,如图 1-7 所示。
Figure 1-7. The architecture for a software system consists of both requirements and all the other architectural characteristics 图 1-7. 软件系统的架构由需求和所有其他架构特征组成
As any experience in the software development world illustrates, nothing remains static. Thus, architects may design a system to meet certain criteria, but that design must survive both implementation (how can architects make sure that their design is implemented correctly) and the inevitable change driven by the software development ecosystem. What we need is an evolutionary architecture. 正如软件开发领域的任何经验所示,事物没有什么是静态的。因此,架构师可能会设计一个系统以满足某些标准,但该设计必须经受住实施(架构师如何确保他们的设计被正确实施)和软件开发生态系统驱动的不可避免的变化。我们需要的是一种进化架构。
Building Evolutionary Architectures introduces the concept of using fitness functions to protect (and govern) architectural characteristics as change occurs over time. The concept comes from evolutionary computing. When designing a genetic algorithm, developers have a variety of techniques to mutate the solution, evolving new solutions iteratively. When designing such an algorithm for a specific goal, developers must measure the outcome to see if it is closer or further away from an optimal solution; that measure is a fitness function. For example, if developers designed a genetic algorithm to solve the traveling salesperson problem (whose goal is the shortest route between various cities), the fitness function would look at the path length. 《构建进化架构》引入了使用适应度函数来保护(和管理)架构特性这一概念,以应对随时间变化的情况。这个概念源于进化计算。在设计遗传算法时,开发人员有多种技术可以变异解决方案,迭代地演化出新的解决方案。当为特定目标设计这样的算法时,开发人员必须测量结果,以查看它是更接近还是更远离最优解决方案;这个测量就是适应度函数。例如,如果开发人员设计了一个遗传算法来解决旅行推销员问题(其目标是在各个城市之间找到最短路径),那么适应度函数将关注路径长度。
Building Evolutionary Architectures co-opts this idea to create architectural fitness functions: an objective integrity assessment of some architectural characteristic(s). This assessment may include a variety of mechanisms, such as metrics, unit tests, monitors, and chaos engineering. For example, an architect may identify page load time as an importance characteristic of the architecture. To allow the system to change without degrading performance, the architecture builds a fitness function as a test that measures page load time for each page and then runs the test as part of the continuous integration for the project. Thus, architects always know the status of critical parts of the architecture because they have a verification mechanism in the form of fitness functions for each part. 构建进化架构借用了这个想法来创建架构适应性函数:对某些架构特征的客观完整性评估。该评估可能包括多种机制,例如度量、单元测试、监控和混沌工程。例如,架构师可能会将页面加载时间识别为架构的重要特征。为了允许系统在不降低性能的情况下进行更改,架构构建了一个适应性函数作为测试,测量每个页面的加载时间,然后将该测试作为项目持续集成的一部分运行。因此,架构师始终知道架构关键部分的状态,因为他们为每个部分都有适应性函数作为验证机制。
We won’t go into the full details of fitness functions here. However, we will point out opportunities and examples of the approach where applicable. Note the correlation between how often fitness functions execute and the feedback they provide. You’ll see that adopting Agile engineering practices such as continuous integration, automated machine provisioning, and similar practices makes building resilient architectures easier. It also illustrates how intertwined architecture has become with engineering practices. 我们在这里不会详细讨论适应度函数的全部细节。然而,我们会指出适用的机会和示例。请注意适应度函数执行的频率与它们提供的反馈之间的相关性。您会看到,采用敏捷工程实践,如持续集成、自动化机器配置等类似实践,使构建弹性架构变得更容易。这也说明了架构与工程实践之间的紧密联系。
Operations/DevOps 运营/DevOps
The most obvious recent intersection between architecture and related fields occurred with the advent of DevOps, driven by some rethinking of architectural axioms. For many years, many companies considered operations as a separate function from software development; they often outsource operations to another company as a costsaving measure. Many architectures designed during the 1990s and 2000s assumed that architects couldn’t control operations and were built defensively around that restriction (for a good example of this, see Space-Based Architecture in Chapter 15). 最近建筑与相关领域之间最明显的交集出现在 DevOps 的出现上,这一变化源于对建筑公理的一些重新思考。多年来,许多公司将运营视为与软件开发分开的功能;他们常常将运营外包给另一家公司以节省成本。许多在 1990 年代和 2000 年代设计的架构假设架构师无法控制运营,并围绕这一限制进行防御性构建(有关此的一个好例子,请参见第 15 章的基于空间的架构)。
However, a few years ago, several companies started experimenting with new forms of architecture that combine many operational concerns with the architecture. For example, in older-style architectures, such as ESB-driven SOA, the architecture was designed to handle things like elastic scale, greatly complicating the architecture in the process. Basically, architects were forced to defensively design around the limita- 然而,几年前,一些公司开始尝试将许多操作性问题与架构结合的新型架构。例如,在旧式架构中,如基于 ESB 的 SOA,架构被设计用来处理弹性扩展等问题,这在过程中大大复杂化了架构。基本上,架构师被迫围绕限制进行防御性设计。
tions introduced because of the cost-saving measure of outsourcing operations. Thus, they built architectures that could handle scale, performance, elasticity, and a host of other capabilities internally. The side effect of that design was vastly more complex architecture. 由于外包运营的节省成本措施而引入的限制。因此,他们构建了能够在内部处理规模、性能、弹性以及其他一系列能力的架构。这种设计的副作用是架构变得极为复杂。
The builders of the microservices style of architecture realized that these operational concerns are better handled by operations. By creating a liaison between architecture and operations, the architects can simplify the design and rely on operations for the things they handle best. Thus, realizing a misappropriation of resources led to accidental complexity, and architects and operations teamed up to create microservices, the details of which we cover in Chapter 17. 微服务架构的构建者意识到,这些操作性问题更适合由运营来处理。通过在架构和运营之间建立联系,架构师可以简化设计,并依赖运营来处理他们最擅长的事情。因此,意识到资源的误用导致了意外的复杂性,架构师和运营团队合作创建了微服务,相关细节我们将在第 17 章中讨论。
Process 过程
Another axiom is that software architecture is mostly orthogonal to the software development process; the way that you build software (process) has little impact on the software architecture (structure). Thus, while the software development process a team uses has some impact on software architecture (especially around engineering practices), historically they have been thought of as mostly separate. Most books on software architecture ignore the software development process, making specious assumptions about things like predictability. However, the process by which teams develop software has an impact on many facets of software architecture. For example, many companies over the last few decades have adopted Agile development methodologies because of the nature of software. Architects in Agile projects can assume iterative development and therefore a faster feedback loop for decisions. That in turn allows architects to be more aggressive about experimentation and other knowledge that relies on feedback. 另一个公理是,软件架构与软件开发过程大多是正交的;构建软件的方式(过程)对软件架构(结构)影响很小。因此,尽管团队使用的软件开发过程对软件架构有一定影响(特别是在工程实践方面),但历史上它们被认为是大多独立的。大多数关于软件架构的书籍忽略了软件开发过程,对可预测性等问题做出了不切实际的假设。然而,团队开发软件的过程对软件架构的许多方面都有影响。例如,在过去几十年中,许多公司由于软件的性质而采用了敏捷开发方法论。敏捷项目中的架构师可以假设迭代开发,因此决策的反馈循环更快。这反过来使架构师在实验和其他依赖反馈的知识方面更加积极。
As the previous quote from Mark observes, all architecture becomes iterative; it’s only a matter of time. Toward that end, we’re going assume a baseline of Agile methodologies throughout and call out exceptions where appropriate. For example, it is still common for many monolithic architectures to use older processes because of their age, politics, or other mitigating factors unrelated to software. 正如马克之前的引用所观察到的,所有架构都变得迭代;这只是时间问题。为此,我们将假设在整个过程中采用敏捷方法,并在适当的地方指出例外。例如,由于其年龄、政治或其他与软件无关的缓解因素,许多单体架构仍然使用较旧的流程是很常见的。
One critical aspect of architecture where Agile methodologies shine is restructuring. Teams often find that they need to migrate their architecture from one pattern to another. For example, a team started with a monolithic architecture because it was easy and fast to bootstrap, but now they need to move it to a more modern architecture. Agile methodologies support these kinds of changes better than planning-heavy processes because of the tight feedback loop and encouragement of techniques like the Strangler Pattern and feature toggles. 架构的一个关键方面是重构,这是敏捷方法论的强项。团队经常发现他们需要将架构从一种模式迁移到另一种模式。例如,一个团队最初采用了单体架构,因为它易于快速启动,但现在他们需要将其迁移到更现代的架构。敏捷方法论比重规划的流程更好地支持这些变化,因为它有紧密的反馈循环,并鼓励使用像 Strangler Pattern 和功能切换这样的技术。
Data 数据
A large percentage of serious application development includes external data storage, often in the form of a relational (or, increasingly, NoSQL) database. However, many books about software architecture include only a light treatment of this important aspect of architecture. Code and data have a symbiotic relationship: one isn’t useful without the other. 大量的严肃应用开发包括外部数据存储,通常以关系型(或越来越多的 NoSQL)数据库的形式存在。然而,许多关于软件架构的书籍仅对这一重要架构方面进行了轻描淡写的处理。代码和数据之间存在共生关系:没有一个,另一个就没有用。
Database administrators often work alongside architects to build data architecture for complex systems, analyzing how relationships and reuse will affect a portfolio of applications. We won’t delve into that level of specialized detail in this book. At the same time, we won’t ignore the existence and dependence on external storage. In particular, when we talk about the operational aspects of architecture and architectural quantum (see “Architectural Quanta and Granularity” on page 92), we include important external concerns such as databases. 数据库管理员通常与架构师合作,为复杂系统构建数据架构,分析关系和重用将如何影响应用程序组合。我们在本书中不会深入探讨这一专业细节。同时,我们也不会忽视对外部存储的存在和依赖。特别是当我们谈论架构的操作方面和架构量子(见第 92 页的“架构量子和粒度”)时,我们会包括重要的外部关注点,例如数据库。
Laws of Software Architecture 软件架构法则
While the scope of software architecture is almost impossibly broad, unifying elements do exist. The authors have first and foremost learned the First Law of Software Architecture by constantly stumbling across it: 虽然软件架构的范围几乎广泛得令人难以置信,但确实存在统一的元素。作者首先通过不断遇到它,学习到了软件架构的第一法则:
Everything in software architecture is a trade-off. 软件架构中的一切都是权衡。
-First Law of Software Architecture -软件架构第一法则
Nothing exists on a nice, clean spectrum for software architects. Every decision must take into account many opposing factors. 软件架构师的工作并不存在于一个干净、整洁的光谱上。每个决策都必须考虑许多相对立的因素。
If an architect thinks they have discovered something that isn’t a trade-off, more likely they just haven’t identified the trade-off yet. 如果一个架构师认为他们发现了某些不是权衡的东西,更可能的是他们还没有识别出这个权衡。
-Corollary 1 -推论 1
We define software architecture in terms beyond structural scaffolding, incorporating principles, characteristics, and so on. Architecture is broader than just the combination of structural elements, reflected in our Second Law of Software Architecture: 我们将软件架构定义为超越结构框架的概念,融入原则、特征等。架构不仅仅是结构元素的组合,这在我们的软件架构第二法则中得到了体现:
Why is more important than how. 为什么比怎么更重要。
-Second Law of Software Architecture -软件架构第二定律
The authors discovered the importance of this perspective when we tried keeping the results of exercises done by students during workshop as they crafted architecture solutions. Because the exercises were timed, the only artifacts we kept were the diagrams representing the topology. In other words, we captured how they solved the problem but not why the team made particular choices. An architect can look at an existing system they have no knowledge of and ascertain how the structure of the 作者们在尝试保留学生在研讨会期间进行架构解决方案时所做练习的结果时,发现了这种视角的重要性。由于练习是有时间限制的,我们保留的唯一文档是表示拓扑的图表。换句话说,我们捕捉到了他们如何解决问题,但没有记录团队为何做出特定选择。架构师可以查看一个他们不了解的现有系统,并确定该系统的结构。
architecture works, but will struggle explaining why certain choices were made versus others. 架构是如何工作的,但在解释为什么做出某些选择而不是其他选择时会遇到困难。
Throughout the book, we highlight why architects make certain decisions along with trade-offs. We also highlight good techniques for capturing important decisions in “Architecture Decision Records” on page 285. 在整本书中,我们强调了建筑师为何做出某些决策以及权衡取舍。我们还强调了在第 285 页中记录重要决策的“架构决策记录”的良好技术。
Foundations 基础
To understand important trade-offs in architecture, developers must understand some basic concepts and terminology concerning components, modularity, coupling, and connascence. 要理解架构中的重要权衡,开发人员必须了解一些关于组件、模块化、耦合和共生的基本概念和术语。
CHAPTER 2 第二章
Architectural Thinking 架构思维
An architect sees things differently from a developer’s point of view, much in the same way a meteorologist might see clouds differently from an artist’s point of view. This is called architectural thinking. Unfortunately, too many architects believe that architectural thinking is simply just “thinking about the architecture.” 建筑师从开发者的角度看事物的方式不同,就像气象学家从艺术家的角度看云彩的方式不同。这被称为建筑思维。不幸的是,太多的建筑师认为建筑思维仅仅是“思考建筑”。
Architectural thinking is much more than that. It is seeing things with an architectural eye, or an architectural point of view. There are four main aspects of thinking like an architect. First, it’s understanding the difference between architecture and design and knowing how to collaborate with development teams to make architecture work. Second, it’s about having a wide breadth of technical knowledge while still maintaining a certain level of technical depth, allowing the architect to see solutions and possibilities that others do not see. Third, it’s about understanding, analyzing, and reconciling trade-offs between various solutions and technologies. Finally, it’s about understanding the importance of business drivers and how they translate to architectural concerns. 建筑思维远不止于此。它是用建筑的眼光或建筑的视角来看待事物。像建筑师一样思考主要有四个方面。首先,理解架构与设计之间的区别,并知道如何与开发团队合作使架构发挥作用。其次,拥有广泛的技术知识,同时保持一定的技术深度,使建筑师能够看到其他人看不到的解决方案和可能性。第三,理解、分析和调和各种解决方案和技术之间的权衡。最后,理解商业驱动因素的重要性以及它们如何转化为架构关注点。
In this chapter we explore these four aspects of thinking like an architect and seeing things with an architectural eye. 在本章中,我们探讨这四个方面,思考像建筑师一样,并用建筑的眼光看待事物。
Architecture Versus Design 架构与设计
The difference between architecture and design is often a confusing one. Where does architecture end and design begin? What responsibilities does an architect have versus those of a developer? Thinking like an architect is knowing the difference between architecture and design and seeing how the two integrate closely to form solutions to business and technical problems. 架构和设计之间的区别常常令人困惑。架构在哪里结束,设计从哪里开始?架构师与开发者的责任有什么不同?像架构师一样思考就是要了解架构和设计之间的区别,并看到两者如何紧密结合以形成解决业务和技术问题的方案。
Consider Figure 2-1, which illustrates the traditional responsibilities an architect has, as compared to those of a developer. As shown in the diagram, an architect is respon- 考虑图 2-1,它展示了架构师与开发人员的传统职责对比。如图所示,架构师负责
sible for things like analyzing business requirements to extract and define the architectural characteristics ("-ilities"), selecting which architecture patterns and styles would fit the problem domain, and creating components (the building blocks of the system). The artifacts created from these activities are then handed off to the development team, which is responsible for creating class diagrams for each component, creating user interface screens, and developing and testing source code. 负责分析业务需求以提取和定义架构特征(“-ilities”)、选择适合问题领域的架构模式和风格,以及创建组件(系统的构建块)。这些活动产生的工件随后交给开发团队,开发团队负责为每个组件创建类图、创建用户界面屏幕,以及开发和测试源代码。
Figure 2-1. Traditional view of architecture versus design 图 2-1. 传统架构与设计的对比
There are several issues with the traditional responsibility model illustrated in Figure 2-1. As a matter of fact, this illustration shows exactly why architecture rarely works. Specifically, it is the unidirectional arrow passing though the virtual and physical barriers separating the architect from the developer that causes all of the problems associated with architecture. Decisions an architect makes sometimes never make it to the development teams, and decisions development teams make that change the architecture rarely get back to the architect. In this model the architect is disconnected from the development teams, and as such the architecture rarely provides what it was originally set out to do. 传统责任模型存在几个问题,如图 2-1 所示。事实上,这个插图正好说明了为什么架构很少有效。具体来说,穿过虚拟和物理障碍、将架构师与开发者分开的单向箭头导致了与架构相关的所有问题。架构师所做的决策有时根本无法传达到开发团队,而开发团队所做的改变架构的决策也很少反馈给架构师。在这个模型中,架构师与开发团队脱节,因此架构很少能实现其最初设定的目标。
To make architecture work, both the physical and virtual barriers that exist between architects and developers must be broken down, thus forming a strong bidirectional relationship between architects and development teams. The architect and developer must be on the same virtual team to make this work, as depicted in Figure 2-2. Not only does this model facilitate strong bidirectional communication between architecture and development, but it also allows the architect to provide mentoring and coaching to developers on the team. 为了使架构有效,架构师和开发人员之间存在的物理和虚拟障碍必须被打破,从而形成架构师与开发团队之间的强双向关系。架构师和开发人员必须在同一个虚拟团队中才能实现这一点,如图 2-2 所示。这个模型不仅促进了架构与开发之间的强双向沟通,还允许架构师对团队中的开发人员提供指导和辅导。
Figure 2-2. Making architecture work through collaboration 图 2-2. 通过协作使架构发挥作用
Unlike the old-school waterfall approaches to static and rigid software architecture, the architecture of today’s systems changes and evolves every iteration or phase of a project. A tight collaboration between the architect and the development team is essential for the success of any software project. So where does architecture end and design begin? It doesn’t. They are both part of the circle of life within a software project and must always be kept in synchronization with each other in order to succeed. 与旧式的瀑布式静态和僵化的软件架构不同,今天系统的架构在项目的每个迭代或阶段中都会变化和演变。架构师与开发团队之间的紧密合作对于任何软件项目的成功至关重要。那么架构在哪里结束,设计又从哪里开始呢?其实并没有。它们都是软件项目生命周期的一部分,必须始终保持同步,以确保成功。
Technical Breadth 技术广度
The scope of technological detail differs between developers and architects. Unlike a developer, who must have a significant amount of technical depth to perform their job, a software architect must have a significant amount of technical breadth to think like an architect and see things with an architecture point of view. This is illustrated by the knowledge pyramid shown in Figure 2-3, which encapsulates all the technical knowledge in the world. It turns out that the kind of information a technologist should value differs with career stages. 技术细节的范围在开发人员和架构师之间有所不同。与开发人员不同,开发人员必须具备相当深厚的技术知识才能完成他们的工作,而软件架构师则必须具备相当广泛的技术知识,以便像架构师一样思考,并从架构的角度看待事物。这一点在图 2-3 中所示的知识金字塔中得到了体现,该金字塔概括了世界上所有的技术知识。事实证明,技术人员应该重视的信息类型随着职业阶段的不同而有所不同。
Figure 2-3. The pyramid representing all knowledge 图 2-3. 代表所有知识的金字塔
As shown in Figure 2-3, any individual can partition all their knowledge into three sections: stuff you know, stuff you know you don’t know, and stuff you don’t know you don’t know. 如图 2-3 所示,任何个人都可以将他们的所有知识划分为三个部分:你知道的东西,你知道你不知道的东西,以及你不知道你不知道的东西。
Stuff you know includes the technologies, frameworks, languages, and tools a technologist uses on a daily basis to perform their job, such as knowing Java as a Java programmer. Stuff you know you don’t know includes those things a technologist knows a little about or has heard of but has little or no expertise in. A good example of this level of knowledge is the Clojure programming language. Most technologists have heard of Clojure and know it’s a programming language based on Lisp, but they can’t code in the language. Stuff you don’t know you don’t know is the largest part of the knowledge triangle and includes the entire host of technologies, tools, frameworks, and languages that would be the perfect solution to a problem a technologist is trying to solve, but the technologist doesn’t even know those things exist. 你所知道的东西包括技术人员每天用来执行工作的技术、框架、语言和工具,例如作为 Java 程序员知道 Java。你知道但不知道的东西包括技术人员知道一点或听说过但没有或几乎没有专业知识的事物。一个很好的例子是 Clojure 编程语言。大多数技术人员听说过 Clojure,并知道它是一种基于 Lisp 的编程语言,但他们无法用这种语言编写代码。你不知道你不知道的东西是知识三角形中最大的一部分,包括所有技术、工具、框架和语言,这些都是技术人员试图解决问题的完美解决方案,但技术人员甚至不知道这些东西的存在。
A developer’s early career focuses on expanding the top of the pyramid, to build experience and expertise. This is the ideal focus early on, because developers need more perspective, working knowledge, and hands-on experience. Expanding the top incidentally expands the middle section; as developers encounter more technologies and related artifacts, it adds to their stock of stuff you know you don’t know. 开发者的早期职业生涯专注于扩展金字塔的顶部,以积累经验和专业知识。这是早期理想的关注点,因为开发者需要更多的视角、工作知识和实践经验。扩展顶部也会顺带扩展中间部分;随着开发者接触更多技术和相关工件,这会增加他们对自己不知道的知识的储备。
In Figure 2-4, expanding the top of the pyramid is beneficial because expertise is valued. However, the stuff you know is also the stuff you must maintain-nothing is static in the software world. If a developer becomes an expert in Ruby on Rails, that expertise won’t last if they ignore Ruby on Rails for a year or two. The things at the top of the pyramid require time investment to maintain expertise. Ultimately, the size of the top of an individual’s pyramid is their technical depth. 在图 2-4 中,扩展金字塔的顶部是有益的,因为专业知识是有价值的。然而,你所知道的东西也是你必须维护的东西——在软件世界中没有什么是静态的。如果一个开发者在 Ruby on Rails 方面成为专家,但如果他们忽视 Ruby on Rails 一年或两年,这种专业知识就不会持久。金字塔顶部的东西需要时间投资来维持专业知识。最终,个人金字塔顶部的大小就是他们的技术深度。
Figure 2-4. Developers must maintain expertise to retain it 图 2-4. 开发人员必须保持专业知识以保留它
However, the nature of knowledge changes as developers transition into the architect role. A large part of the value of an architect is a broad understanding of technology and how to use it to solve particular problems. For example, as an architect, it is more beneficial to know that five solutions exist for a particular problem than to have singular expertise in only one. The most important parts of the pyramid for architects are the top and middle sections; how far the middle section penetrates into the bottom section represents an architect’s technical breadth, as shown in Figure 2-5. 然而,随着开发人员转变为架构师,知识的性质也发生了变化。架构师的价值很大一部分在于对技术的广泛理解以及如何利用这些技术来解决特定问题。例如,作为架构师,了解某个特定问题存在五种解决方案比仅在一种方案上拥有单一专长更为有益。架构师金字塔中最重要的部分是顶部和中部;中部深入底部的程度代表了架构师的技术广度,如图 2-5 所示。
Figure 2-5. What someone knows is technical depth, and how much someone knows is technical breadth 图 2-5. 某人所知道的技术深度,以及某人知道多少的技术广度
As an architect, breadth is more important than depth. Because architects must make decisions that match capabilities to technical constraints, a broad understanding of a wide variety of solutions is valuable. Thus, for an architect, the wise course of action is to sacrifice some hard-won expertise and use that time to broaden their portfolio, as shown in Figure 2-6. As illustrated in the diagram, some areas of expertise will remain, probably in particularly enjoyable technology areas, while others usefully atrophy. 作为一名架构师,广度比深度更重要。因为架构师必须做出将能力与技术限制相匹配的决策,对各种解决方案的广泛理解是非常有价值的。因此,对于架构师来说,明智的做法是牺牲一些来之不易的专业知识,利用这段时间来拓宽他们的投资组合,如图 2-6 所示。正如图中所示,某些专业领域将会保留,可能是在特别令人愉快的技术领域,而其他领域则会有益地萎缩。
Figure 2-6. Enhanced breadth and shrinking depth for the architect role 图 2-6. 增强的广度和缩小的深度对于架构师角色
Our knowledge pyramid illustrates how fundamentally different the role of architect compares to developer. Developers spend their whole careers honing expertise, and transitioning to the architect role means a shift in that perspective, which many individuals find difficult. This in turn leads to two common dysfunctions: first, an architect tries to maintain expertise in a wide variety of areas, succeeding in none of them and working themselves ragged in the process. Second, it manifests as stale expertise -the mistaken sensation that your outdated information is still cutting edge. We see this often in large companies where the developers who founded the company have moved into leadership roles yet still make technology decisions using ancient criteria (see “Frozen Caveman Anti-Pattern” on page 30). 我们的知识金字塔展示了架构师与开发者角色的根本不同。开发者在整个职业生涯中不断磨练专业技能,而转变为架构师角色意味着视角的转变,这让许多人感到困难。这反过来导致了两种常见的功能失调:首先,架构师试图在各种领域保持专业知识,但在这些领域都没有成功,反而让自己疲惫不堪。其次,这表现为过时的专业知识——错误地感觉自己过时的信息仍然是前沿的。我们在大型公司中经常看到这种情况,那些创办公司的开发者已经转入领导角色,但仍然使用古老的标准做出技术决策(见第 30 页的“冰冻穴居人反模式”)。
Architects should focus on technical breadth so that they have a larger quiver from which to draw arrows. Developers transitioning to the architect role may have to change the way they view knowledge acquisition. Balancing their portfolio of knowledge regarding depth versus breadth is something every developer should consider throughout their career. 架构师应该关注技术广度,以便他们有更大的箭袋可以抽取箭矢。转型为架构师的开发人员可能需要改变他们对知识获取的看法。平衡他们在深度与广度方面的知识组合是每个开发人员在职业生涯中都应该考虑的事情。
Frozen Caveman Anti-Pattern 冰冻穴居人反模式
A behavioral anti-pattern commonly observed in the wild, the Frozen Caveman AntiPattern, describes an architect who always reverts back to their pet irrational concern for every architecture. For example, one of Neal’s colleagues worked on a system that featured a centralized architecture. Yet, each time they delivered the design to the client architects, the persistent question was “But what if we lose Italy?” Several years before, a freak communication problem had prevented headquarters from communicating with its stores in Italy, causing great inconvenience. While the chances of a reoccurrence were extremely small, the architects had become obsessed about this particular architectural characteristic. 一种在实际中常见的行为反模式,称为“冰冻穴居人反模式”,描述了一位架构师总是对每个架构回归到他们的宠物非理性担忧。例如,尼尔的一位同事曾在一个集中式架构的系统上工作。然而,每次他们将设计交给客户架构师时,反复出现的问题是“如果我们失去意大利怎么办?”几年前,一次奇怪的通信问题导致总部无法与其在意大利的商店沟通,造成了很大的不便。尽管再次发生的可能性极小,但架构师们对这个特定的架构特征变得痴迷。
Generally, this anti-pattern manifests in architects who have been burned in the past by a poor decision or unexpected occurrence, making them particularly cautious in the future. While risk assessment is important, it should be realistic as well. Understanding the difference between genuine versus perceived technical risk is part of the ongoing learning process for architects. Thinking like an architect requires overcoming these “frozen caveman” ideas and experiences, seeing other solutions, and asking more relevant questions. 通常,这种反模式表现为那些在过去因糟糕的决策或意外事件而受到伤害的架构师,使他们在未来变得特别谨慎。虽然风险评估很重要,但它也应该是现实的。理解真正的技术风险与感知的技术风险之间的区别是架构师持续学习过程的一部分。像架构师一样思考需要克服这些“冰冻的穴居人”思想和经验,看到其他解决方案,并提出更相关的问题。
Analyzing Trade-Offs 分析权衡
Thinking like an architect is all about seeing trade-offs in every solution, technical or otherwise, and analyzing those trade-offs to determine what is the best solution. To quote Mark (one of your authors): 像建筑师一样思考就是要在每个解决方案中看到权衡,无论是技术上的还是其他方面的,并分析这些权衡以确定最佳解决方案。引用马克(你们的作者之一):
Architecture is the stuff you can’t Google. 架构是你无法在谷歌上找到的东西。
Everything in architecture is a trade-off, which is why the famous answer to every architecture question in the universe is “it depends.” While many people get increasingly annoyed at this answer, it is unfortunately true. You cannot Google the answer to whether REST or messaging would be better, or whether microservices is the right architecture style, because it does depend. It depends on the deployment environment, business drivers, company culture, budgets, timeframes, developer skill set, and dozens of other factors. Everyone’s environment, situation, and problem is different, hence why architecture is so hard. To quote Neal (another one of your authors): 架构中的一切都是权衡,这就是为什么宇宙中每个架构问题的著名答案是“这要看情况”。虽然许多人对这个答案越来越感到恼火,但不幸的是,这是真的。你无法通过谷歌找到 REST 或消息传递哪个更好,或者微服务是否是正确的架构风格的答案,因为这确实要看情况。这取决于部署环境、业务驱动因素、公司文化、预算、时间框架、开发者技能以及其他数十个因素。每个人的环境、情况和问题都是不同的,这就是为什么架构如此困难。引用 Neal(你们的另一位作者)的话:
There are no right or wrong answers in architecture-only trade-offs. 在架构中没有对错之分,只有权衡。
For example, consider an item auction system, as illustrated in Figure 2-7, where someone places a bid for an item up for auction. 例如,考虑一个物品拍卖系统,如图 2-7 所示,在该系统中,有人对正在拍卖的物品出价。
Figure 2-7. Auction system example of a trade-off-queues or topics? 图 2-7. 拍卖系统示例:权衡——队列还是主题?
The Bid Producer service generates a bid from the bidder and then sends that bid amount to the Bid Capture, Bid Tracking, and Bid Analytics services. This could be done by using queues in a point-to-point messaging fashion or by using a topic in a publish-and-subscribe messaging fashion. Which one should the architect use? You can’t Google the answer. Architectural thinking requires the architect to analyze the trade-offs associated with each option and select the best one given the specific situation. 投标生成服务从投标人那里生成一个投标,然后将该投标金额发送到投标捕获、投标跟踪和投标分析服务。这可以通过点对点消息传递方式使用队列来完成,也可以通过发布-订阅消息传递方式使用主题来完成。架构师应该使用哪一个?你无法通过谷歌找到答案。架构思维要求架构师分析与每个选项相关的权衡,并根据具体情况选择最佳选项。
The two messaging options for the item auction system are shown in Figures 2-8 and 2-9, with Figure 2-8 illustrating the use of a topic in a publish-and-subscribe messaging model, and Figure 2-9 illustrating the use of queues in a point-to-point messaging model. 该物品拍卖系统的两种消息传递选项如图 2-8 和图 2-9 所示,图 2-8 展示了在发布-订阅消息传递模型中使用主题的情况,而图 2-9 则展示了在点对点消息传递模型中使用队列的情况。
Figure 2-8. Use of a topic for communication between services 图 2-8. 使用主题在服务之间进行通信
Figure 2-9. Use of queues for communication between services 图 2-9. 使用队列进行服务之间的通信
The clear advantage (and seemingly obvious solution) to this problem in Figure 2-8 is that of architectural extensibility. The Bid Producer service only requires a single connection to a topic, unlike the queue solution in Figure 2-9 where the Bid Pro ducer needs to connect to three different queues. If a new service called Bid History were to be added to this system due to the requirement to provide each bidder with a history of all the bids they made in each auction, no changes at all would be needed to the existing system. When the new Bid History service is created, it could simply subscribe to the topic already containing the bid information. In the queue option shown in Figure 2-9, however, a new queue would be required for the Bid History service, and the Bid Producer would need to be modified to add an additional connection to the new queue. The point here is that using queues requires significant change to the system when adding new bidding functionality, whereas with the topic approach no changes are needed at all in the existing infrastructure. Also, notice that the Bid Producer is more decoupled in the topic option-the Bid Producer doesn’t know how the bidding information will be used or by which services. In the queue option the Bid Producer knows exactly how the bidding information is used (and by whom), and hence is more coupled to the system. 图 2-8 中这个问题的明显优势(看似显而易见的解决方案)是架构的可扩展性。投标生产者服务只需要与一个主题建立单一连接,而图 2-9 中的队列解决方案则要求投标生产者需要连接到三个不同的队列。如果由于需要为每个投标者提供他们在每次拍卖中所做的所有投标的历史,而向该系统添加一个名为投标历史的新服务,则现有系统完全不需要进行任何更改。当新的投标历史服务创建时,它可以简单地订阅已经包含投标信息的主题。然而,在图 2-9 中显示的队列选项中,投标历史服务将需要一个新的队列,并且投标生产者需要进行修改,以添加与新队列的额外连接。这里的关键是,使用队列在添加新的投标功能时需要对系统进行重大更改,而使用主题方法则在现有基础设施中完全不需要进行任何更改。 另外,请注意,在主题选项中,投标生产者的耦合性更低——投标生产者不知道投标信息将如何使用或由哪些服务使用。在队列选项中,投标生产者确切知道投标信息是如何使用的(以及由谁使用),因此与系统的耦合性更高。
With this analysis it seems clear that the topic approach using the publish-andsubscribe messaging model is the obvious and best choice. However, to quote Rich Hickey, the creator of the Clojure programming language: 通过这项分析,似乎很明显,使用发布-订阅消息模型的主题方法是显而易见的最佳选择。然而,引用 Clojure 编程语言的创始人 Rich Hickey 的话:
Programmers know the benefits of everything and the trade-offs of nothing. Architects need to understand both. 程序员知道一切的好处,却对任何事情的权衡一无所知。架构师需要理解两者。
Thinking architecturally is looking at the benefits of a given solution, but also analyzing the negatives, or trade-offs, associated with a solution. Continuing with the auction system example, a software architect would analyze the negatives of the topic solution. In analyzing the differences, notice first in Figure 2-8 that with a topic, anyone can access bidding data, which introduces a possible issue with data access and data security. In the queue model illustrated in Figure 2-9, the data sent to the queue can only be accessed by the specific consumer receiving that message. If a rogue service did listen in on a queue, those bids would not be received by the corresponding service, and a notification would immediately be sent about the loss of data (and hence a possible security breach). In other words, it is very easy to wiretap into a topic, but not a queue. 从架构的角度思考是查看给定解决方案的好处,同时也分析与该解决方案相关的负面因素或权衡。继续以拍卖系统为例,软件架构师会分析该主题解决方案的负面因素。在分析差异时,首先注意图 2-8 中,使用主题时,任何人都可以访问竞标数据,这引入了数据访问和数据安全的潜在问题。在图 2-9 中所示的队列模型中,发送到队列的数据只能被接收该消息的特定消费者访问。如果一个恶意服务监听了队列,那么这些竞标将不会被相应的服务接收,并且会立即发送关于数据丢失(因此可能存在安全漏洞)的通知。换句话说,窃听一个主题是非常容易的,但窃听一个队列则不是。
In addition to the security issue, the topic solution in Figure 2-8 only supports homogeneous contracts. All services receiving the bidding data must accept the same contract and set of bidding data. In the queue option in Figure 2-9, each consumer can have its own contract specific to the data it needs. For example, suppose the new Bid History service requires the current asking price along with the bid, but no other service needs that information. In this case, the contract would need to be modified, impacting all other services using that data. In the queue model, this would be a separate channel, hence a separate contract not impacting any other service. 除了安全问题,图 2-8 中的主题解决方案仅支持同质合约。所有接收投标数据的服务必须接受相同的合约和投标数据集。在图 2-9 中的队列选项中,每个消费者可以拥有其特定于所需数据的合约。例如,假设新的投标历史服务需要当前的要价和投标,但没有其他服务需要该信息。在这种情况下,合约需要被修改,这会影响所有使用该数据的其他服务。在队列模型中,这将是一个单独的通道,因此是一个单独的合约,不会影响任何其他服务。
Another disadvantage of the topic model illustrated in Figure 2-8 is that it does not support monitoring of the number of messages in the topic and hence auto-scaling capabilities. However, with the queue option in Figure 2-9, each queue can be monitored individually, and programmatic load balancing applied to each bidding consumer so that each can be automatically scaled independency from one another. Note that this trade-off is technology specific in that the Advanced Message Queuing Protocol (AMQP) can support programmatic load balancing and monitoring because of the separation between an exchange (what the producer sends to) and a queue (what the consumer listens to). 图 2-8 中所示的主题模型的另一个缺点是它不支持监控主题中的消息数量,因此不具备自动扩展能力。然而,在图 2-9 中的队列选项中,每个队列可以单独监控,并且可以对每个竞标消费者应用程序化负载均衡,从而使每个消费者可以独立于彼此自动扩展。请注意,这种权衡是特定于技术的,因为高级消息队列协议(AMQP)可以支持程序化负载均衡和监控,因为交换(生产者发送到的地方)和队列(消费者监听的地方)之间存在分离。
Given this trade-off analysis, now which is the better option? And the answer? It depends! Table 2-1 summarizes these trade-offs. 考虑到这种权衡分析,现在哪个选项更好?答案是?这要看情况!表 2-1 总结了这些权衡。
Table 2-1. Trade-offs for topics 表 2-1. 主题的权衡
Topic advantages 主题优势
Topic disadvantages 主题缺点
Architectural extensibility 架构可扩展性
Data access and data security concerns 数据访问和数据安全问题
Service decoupling 服务解耦
No heterogeneous contracts 没有异构合同
监控和程序化可扩展性
Monitoring and programmatic
scalability
Monitoring and programmatic
scalability| Monitoring and programmatic |
| :--- |
| scalability |
Topic advantages Topic disadvantages
Architectural extensibility Data access and data security concerns
Service decoupling No heterogeneous contracts
"Monitoring and programmatic
scalability"| Topic advantages | Topic disadvantages |
| :--- | :--- |
| Architectural extensibility | Data access and data security concerns |
| Service decoupling | No heterogeneous contracts |
| | Monitoring and programmatic <br> scalability |
The point here is that everything in software architecture has a trade-off: an advantage and disadvantage. Thinking like an architect is analyzing these trade-offs, then asking “which is more important: extensibility or security?” The decision between different solutions will always depend on the business drivers, environment, and a host of other factors. 这里的关键是,软件架构中的一切都有权衡:优点和缺点。像架构师一样思考就是分析这些权衡,然后问“哪个更重要:可扩展性还是安全性?”不同解决方案之间的决策总是取决于业务驱动因素、环境以及其他一系列因素。
Understanding Business Drivers 理解业务驱动因素
Thinking like an architect is understanding the business drivers that are required for the success of the system and translating those requirements into architecture characteristics (such as scalability, performance, and availability). This is a challenging task that requires the architect to have some level of business domain knowledge and healthy, collaborative relationships with key business stakeholders. We’ve devoted several chapters in the book on this specific topic. In Chapter 4 we define various architecture characteristics. In Chapter 5 we describe ways to identify and qualify architecture characteristics. And in Chapter 6 we describe how to measure each of these characteristics to ensure the business needs of the system are met. 像架构师一样思考就是理解系统成功所需的业务驱动因素,并将这些需求转化为架构特性(如可扩展性、性能和可用性)。这是一个具有挑战性的任务,要求架构师具备一定的业务领域知识,并与关键业务利益相关者建立健康、协作的关系。我们在书中专门为这个主题安排了几个章节。在第 4 章中,我们定义了各种架构特性。在第 5 章中,我们描述了识别和评估架构特性的方法。在第 6 章中,我们描述了如何测量这些特性,以确保系统的业务需求得到满足。
Balancing Architecture and Hands-On Coding 平衡架构与实际编码
One of the difficult tasks an architect faces is how to balance hands-on coding with software architecture. We firmly believe that every architect should code and be able to maintain a certain level of technical depth (see “Technical Breadth” on page 25). While this may seem like an easy task, it is sometimes rather difficult to accomplish. 架构师面临的一个困难任务是如何平衡动手编码与软件架构。我们坚信每位架构师都应该编码,并能够保持一定的技术深度(见第 25 页的“技术广度”)。虽然这看起来是一个简单的任务,但有时实现起来相当困难。
The first tip in striving for a balance between hands-on coding and being a software architect is avoiding the bottleneck trap. The bottleneck trap occurs when the architect has taken ownership of code within the critical path of a project (usually the underlying framework code) and becomes a bottleneck to the team. This happens because the architect is not a full-time developer and therefore must balance between playing the developer role (writing and testing source code) and the architect role (drawing diagrams, attending meetings, and well, attending more meetings). 在追求编码与软件架构师之间平衡的第一条建议是避免瓶颈陷阱。瓶颈陷阱发生在架构师对项目关键路径中的代码(通常是底层框架代码)承担了所有权,并因此成为团队的瓶颈。这是因为架构师并不是全职开发人员,因此必须在开发者角色(编写和测试源代码)和架构师角色(绘制图表、参加会议,以及,参加更多会议)之间取得平衡。
One way to avoid the bottleneck trap as an effective software architect is to delegate the critical path and framework code to others on the development team and then focus on coding a piece of business functionality (a service or a screen) one to three iterations down the road. Three positive things happen by doing this. First, the architect is gaining hands-on experience writing production code while no longer becoming a bottleneck on the team. Second, the critical path and framework code is distributed to the development team (where it belongs), giving them ownership and a better understanding of the harder parts of the system. Third, and perhaps most important, the architect is writing the same business-related source code as the development team and is therefore better able to identify with the development team in 作为一名有效的软件架构师,避免瓶颈陷阱的一种方法是将关键路径和框架代码委托给开发团队中的其他人,然后专注于在接下来的一个到三个迭代中编写一段业务功能(一个服务或一个界面)。这样做会带来三个积极的结果。首先,架构师在编写生产代码时获得了实践经验,同时不再成为团队的瓶颈。其次,关键路径和框架代码分配给开发团队(它应该属于那里),使他们拥有所有权并更好地理解系统中更复杂的部分。第三,也许最重要的是,架构师正在编写与开发团队相同的业务相关源代码,因此能够更好地与开发团队认同。
terms of the pain they might be going through with processes, procedures, and the development environment. 他们可能在流程、程序和开发环境中经历的痛苦。
Suppose, however, that the architect is not able to develop code with the development team. How can a software architect still remain hands-on and maintain some level of technical depth? There are four basic ways an architect can still remain hands-on at work without having to “practice coding from home” (although we recommend practicing coding at home as well). 然而,假设架构师无法与开发团队一起编写代码。软件架构师如何仍然保持动手能力并保持一定的技术深度?架构师可以通过四种基本方式在工作中保持动手能力,而不必“在家练习编码”(尽管我们也建议在家练习编码)。
The first way is to do frequent proof-of-concepts or POCs. This practice not only requires the architect to write source code, but it also helps validate an architecture decision by taking the implementation details into account. For example, if an architect is stuck trying to make a decision between two caching solutions, one effective way to help make this decision is to develop a working example in each caching product and compare the results. This allows the architect to see first-hand the implementation details and the amount of effort required to develop the full solution. It also allows the architect to better compare architectural characteristics such as scalability, performance, or overall fault tolerance of the different caching solutions. 第一种方法是进行频繁的概念验证或 POC。这种做法不仅要求架构师编写源代码,还通过考虑实现细节来帮助验证架构决策。例如,如果架构师在两个缓存解决方案之间做决策时遇到困难,一个有效的帮助做出决策的方法是分别在每个缓存产品中开发一个可工作的示例并比较结果。这使架构师能够亲自看到实现细节以及开发完整解决方案所需的努力量。它还使架构师能够更好地比较不同缓存解决方案的架构特性,如可扩展性、性能或整体容错能力。
Our advice when doing proof-of-concept work is that, whenever possible, the architect should write the best production-quality code they can. We recommend this practice for two reasons. First, quite often, throwaway proof-of-concept code goes into the source code repository and becomes the reference architecture or guiding example for others to follow. The last thing an architect would want is for their throwaway, sloppy code to be a representation of their typical work. The second reason is that by writing production-quality proof-of-concept code, the architect gets practice writing quality, well-structured code rather than continually developing bad coding practices. 我们在进行概念验证工作时的建议是,尽可能让架构师编写他们能做到的最佳生产质量代码。我们推荐这种做法有两个原因。首先,往往一次性概念验证代码会进入源代码库,并成为其他人遵循的参考架构或指导示例。架构师最不希望的就是他们的一次性、草率的代码成为他们典型工作的代表。第二个原因是,通过编写生产质量的概念验证代码,架构师可以练习编写高质量、结构良好的代码,而不是不断发展不良的编码习惯。
Another way an architect can remain hands-on is to tackle some of the technical debt stories or architecture stories, freeing the development team up to work on the critical functional user stories. These stories are usually low priority, so if the architect does not have the chance to complete a technical debt or architecture story within a given iteration, it’s not the end of the world and generally does not impact the success of the iteration. 建筑师保持亲力亲为的另一种方式是处理一些技术债务故事或架构故事,从而让开发团队能够专注于关键的功能用户故事。这些故事通常优先级较低,因此如果建筑师在给定的迭代中没有机会完成技术债务或架构故事,这并不是世界末日,通常也不会影响迭代的成功。
Similarly, working on bug fixes within an iteration is another way of maintaining hands-on coding while helping the development team as well. While certainly not glamorous, this technique allows the architect to identify where issues and weakness may be within the code base and possibly the architecture. 同样,在一个迭代中处理错误修复是保持动手编码的另一种方式,同时也帮助开发团队。虽然这肯定不是光鲜的工作,但这种技术使架构师能够识别代码库和可能的架构中存在的问题和弱点。
Leveraging automation by creating simple command-line tools and analyzers to help the development team with their day-to-day tasks is another great way to maintain hands-on coding skills while making the development team more effective. Look for repetitive tasks the development team performs and automate the process. The devel- 通过创建简单的命令行工具和分析器来利用自动化,帮助开发团队处理日常任务,是保持动手编码技能的另一种好方法,同时使开发团队更有效。寻找开发团队执行的重复任务并自动化该过程。开发-
opment team will be grateful for the automation. Some examples are automated source validators to help check for specific coding standards not found in other lint tests, automated checklists, and repetitive manual code refactoring tasks. 开发团队将会感激这种自动化。一些例子包括自动源验证器,以帮助检查其他 lint 测试中未发现的特定编码标准、自动检查清单和重复的手动代码重构任务。
Automation can also be in the form of architectural analysis and fitness functions to ensure the vitality and compliance of the architecture. For example, an architect can write Java code in ArchUnit in the Java platform to automate architectural compliance, or write custom fitness functions to ensure architectural compliance while gaining hands-on experience. We talk about these techniques in Chapter 6. 自动化还可以采取架构分析和适应性函数的形式,以确保架构的活力和合规性。例如,架构师可以在 Java 平台的 ArchUnit 中编写 Java 代码来自动化架构合规性,或者编写自定义适应性函数以确保架构合规,同时获得实践经验。我们在第六章中讨论这些技术。
A final technique to remain hands-on as an architect is to do frequent code reviews. While the architect is not actually writing code, at least they are involved in the source code. Further, doing code reviews has the added benefits of being able to ensure compliance with the architecture and to seek out mentoring and coaching opportunities on the team. 作为架构师保持实践的最后一种技术是进行频繁的代码审查。虽然架构师并不实际编写代码,但至少他们参与了源代码。此外,进行代码审查还有确保遵循架构的额外好处,并能够在团队中寻求指导和辅导的机会。
CHAPTER 3 第三章
Modularity 模块化
First, we want to untangle some common terms used and overused in discussions about architecture surrounding modularity and provide definitions for use throughout the book. 首先,我们想要理清一些在关于模块化架构讨论中常用和过度使用的术语,并提供在整本书中使用的定义。 95%95 \% of the words [about software architecture] are spent extolling the benefits of “modularity” and that little, if anything, is said about how to achieve it. 关于软件架构的 95%95 \% 个词被用来赞美“模块化”的好处,而几乎没有提到如何实现它。
-Glenford J. Myers (1978)
Different platforms offer different reuse mechanisms for code, but all support some way of grouping related code together into modules. While this concept is universal in software architecture, it has proven slippery to define. A casual internet search yields dozens of definitions, with no consistency (and some contradictions). As you can see from the quote from Myers, this isn’t a new problem. However, because no recognized definition exists, we must jump into the fray and provide our own definitions for the sake of consistency throughout the book. 不同的平台提供不同的代码重用机制,但都支持将相关代码组合成模块的某种方式。虽然这个概念在软件架构中是普遍存在的,但它的定义却一直难以捉摸。随便在互联网上搜索会得到数十个定义,缺乏一致性(甚至有些相互矛盾)。正如迈尔斯的引用所示,这并不是一个新问题。然而,由于没有公认的定义,我们必须参与其中,为了整本书的一致性提供我们自己的定义。
Understanding modularity and its many incarnations in the development platform of choice is critical for architects. Many of the tools we have to analyze architecture (such as metrics, fitness functions, and visualizations) rely on these modularity concepts. Modularity is an organizing principle. If an architect designs a system without paying attention to how the pieces wire together, they end up creating a system that presents myriad difficulties. To use a physics analogy, software systems model complex systems, which tend toward entropy (or disorder). Energy must be added to a physical system to preserve order. The same is true for software systems: architects must constantly expend energy to ensure good structural soundness, which won’t happen by accident. 理解模块化及其在所选开发平台中的多种表现形式对架构师至关重要。我们用来分析架构的许多工具(如度量、适应度函数和可视化)依赖于这些模块化概念。模块化是一种组织原则。如果架构师在设计系统时不关注各个部分如何连接在一起,他们最终会创建一个面临无数困难的系统。用物理学的类比,软件系统模拟复杂系统,这些系统趋向于熵(或无序)。必须向物理系统中添加能量以保持秩序。软件系统也是如此:架构师必须不断投入能量以确保良好的结构健全性,这不会自然而然地发生。
Preserving good modularity exemplifies our definition of an implicit architecture characteristic: virtually no project features a requirement that asks the architect to ensure good modular distinction and communication, yet sustainable code bases require order and consistency. 保持良好的模块化体现了我们对隐式架构特征的定义:几乎没有项目要求架构师确保良好的模块区分和沟通,但可持续的代码库需要有序和一致性。
Definition 定义
The dictionary defines module as “each of a set of standardized parts or independent units that can be used to construct a more complex structure.” We use modularity to describe a logical grouping of related code, which could be a group of classes in an object-oriented language or functions in a structured or functional language. Most languages provide mechanisms for modularity (package in Java, namespace in .NET, and so on). Developers typically use modules as a way to group related code together. For example, the com.mycompany.customer package in Java should contain things related to customers. 字典将模块定义为“可以用来构建更复杂结构的一组标准化部分或独立单元中的每一个。”我们使用模块化来描述相关代码的逻辑分组,这可以是面向对象语言中的一组类或结构化或函数式语言中的函数。大多数语言提供模块化的机制(Java 中的包,.NET 中的命名空间,等等)。开发人员通常使用模块作为将相关代码组合在一起的一种方式。例如,Java 中的 com.mycompany.customer 包应该包含与客户相关的内容。
Languages now feature a wide variety of packaging mechanisms, making a developer’s chore of choosing between them difficult. For example, in many modern languages, developers can define behavior in functions/methods, classes, or packages/ namespaces, each with different visibility and scoping rules. Other languages complicate this further by adding programming constructs such as the metaobject protocol to provide developers even more extension mechanisms. 现在的语言具有多种打包机制,使得开发者在选择时面临困难。例如,在许多现代语言中,开发者可以在函数/方法、类或包/命名空间中定义行为,每种都有不同的可见性和作用域规则。其他语言通过添加编程构造(如元对象协议)进一步复杂化这一过程,为开发者提供更多的扩展机制。
Architects must be aware of how developers package things because it has important implications in architecture. For example, if several packages are tightly coupled together, reusing one of them for related work becomes more difficult. 架构师必须意识到开发人员如何打包事物,因为这对架构有重要影响。例如,如果几个包紧密耦合在一起,重新使用其中一个进行相关工作就变得更加困难。
Modular Reuse Before Classes 模块化重用在类之前
Developers who predate object-oriented languages may puzzle over why so many different separation schemes commonly exist. Much of the reason has to do with backward compatibility, not of code but rather for how developers think about things. In March of 1968, Edsger Dijkstra published a letter in the Communications of the ACM entitled “Go To Statement Considered Harmful.” He denigrated the common use of the GOTO statement common in programming languages at the time that allowed nonlinear leaping around within code, making reasoning and debugging difficult. 在面向对象语言出现之前的开发者可能会对为什么存在如此多不同的分离方案感到困惑。其原因大多与向后兼容性有关,不是代码的兼容性,而是开发者思考事物的方式。1968 年 3 月,Edsger Dijkstra 在《ACM 通讯》中发表了一封题为“GOTO 语句的危害”的信。他贬低了当时编程语言中常见的 GOTO 语句的使用,这种语句允许在代码中进行非线性跳跃,使得推理和调试变得困难。
This paper helped usher in the era of structured programming languages, exemplified by Pascal and C, which encouraged deeper thinking about how things fit together. Developers quickly realized that most of the languages had no good way to group like things together logically. Thus, the short era of modular languages was born, such as Modula (Pascal creator Niklaus Wirth’s next language) and Ada. These languages had the programming construct of a module, much as we think about packages or namespaces today (but without the classes). 这篇论文帮助开启了结构化编程语言的时代,以 Pascal 和 C 为例,鼓励人们更深入地思考事物如何结合在一起。开发者很快意识到,大多数语言没有好的方法将相似的事物逻辑上分组。因此,模块化语言的短暂时代诞生了,例如 Modula(Pascal 创造者 Niklaus Wirth 的下一种语言)和 Ada。这些语言具有模块的编程构造,类似于我们今天对包或命名空间的思考(但没有类)。
The modular programming era was short-lived. Object-oriented languages became popular because they offered new ways to encapsulate and reuse code. Still, language designers realized the utility of modules, retaining them in the form of packages, namespaces, etc. Many odd compatibility features exist in languages to support these different paradigms. For example, Java supports modular (via packages and packagelevel initialization using static initializers), object-oriented, and functional paradigms, each programming style with its own scoping rules and quirks. 模块化编程时代短暂。面向对象的语言因其提供了封装和重用代码的新方式而变得流行。然而,语言设计者意识到模块的实用性,以包、命名空间等形式保留了它们。许多奇怪的兼容性特性存在于语言中,以支持这些不同的范式。例如,Java 支持模块化(通过包和使用静态初始化器的包级初始化)、面向对象和函数式范式,每种编程风格都有其自己的作用域规则和特性。
For discussions about architecture, we use modularity as a general term to denote a related grouping of code: classes, functions, or any other grouping. This doesn’t imply a physical separation, merely a logical one; the difference is sometimes important. For example, lumping a large number of classes together in a monolithic application may make sense from a convenience standpoint. However, when it comes time to restructure the architecture, the coupling encouraged by loose partitioning becomes an impediment to breaking the monolith apart. Thus, it is useful to talk about modularity as a concept separate from the physical separation forced or implied by a particular platform. 在关于架构的讨论中,我们使用模块化作为一个通用术语来表示相关代码的分组:类、函数或任何其他分组。这并不意味着物理上的分离,仅仅是逻辑上的分离;这种区别有时很重要。例如,在一个单体应用中将大量类聚集在一起从便利的角度来看可能是合理的。然而,当需要重构架构时,松散分区所鼓励的耦合会成为拆分单体的障碍。因此,讨论模块化作为一个与特定平台强制或暗示的物理分离分开的概念是有用的。
It is worth noting the general concept of namespace, separate from the technical implementation in the .NET platform. Developers often need precise, fully qualified names for software assets to separate different software assets (components, classes, and so on) from each other. The most obvious example that people use every day is the internet: unique, global identifiers tied to IP addresses. Most languages have some modularity mechanism that doubles as a namespace to organize things: variables, functions, and/or methods. Sometimes the module structure is reflected physically. For example, Java requires that its package structure must reflect the directory structure of the physical class files. 值得注意的是命名空间的一般概念,与.NET 平台中的技术实现是分开的。开发人员通常需要精确的、完全限定的软件资产名称,以便将不同的软件资产(组件、类等)彼此分开。人们每天使用的最明显的例子是互联网:与 IP 地址绑定的唯一全球标识符。大多数语言都有某种模块化机制,作为命名空间来组织事物:变量、函数和/或方法。有时模块结构在物理上得以体现。例如,Java 要求其包结构必须反映物理类文件的目录结构。
A Language with No Name Conflicts: Java 1.0 一个没有名称冲突的语言:Java 1.0
The original designers of Java had extensive experience dealing with name conflicts and clashes in the various programming platforms at the time. The original design of Java used a clever hack to avoid the possibility of ambiguity between two classes that had the same name. For example, what if your problem domain included a catalog order and an installation order: both named order but with very different connotations (and classes). The solution in Java was to create the package namespace mechanism, along with the requirement that the physical directory structure just match the package name. Because filesystems won’t allow the same named file to reside in the same directory, they leveraged the inherent features of the operating system to avoid the possibility of ambiguity. Thus, the original classpath in Java contained only directories, disallowing the possibility of name conflicts. Java 的原始设计者在处理当时各种编程平台上的名称冲突和冲突方面具有丰富的经验。Java 的原始设计使用了一种巧妙的技巧,以避免两个同名类之间的歧义。例如,如果你的问题领域包括一个目录订单和一个安装订单:两者都叫做 order,但含义(和类)却截然不同。Java 中的解决方案是创建包命名空间机制,并要求物理目录结构与包名称相匹配。由于文件系统不允许同名文件存在于同一目录中,他们利用操作系统的固有特性来避免歧义的可能性。因此,Java 中的原始类路径仅包含目录,禁止名称冲突的可能性。
However, as the language designers discovered, forcing every project to have a fully formed directory structure was cumbersome, especially as projects became larger. Plus, building reusable assets was difficult: frameworks and libraries must be “exploded” into the directory structure. In the second major release of Java (1.2, called Java 2), designers added the jar mechanism, allowing an archive file to act as a directory structure on a classpath. For the next decade, Java developers struggled with getting the classpath exactly right, as a combination of directories and JAR files. And, of course, the original intent was broken: now two JAR files could create conflicting names on a classpath, leading to numerous war stories of debugging class loaders. 然而,正如语言设计者所发现的,强制每个项目都有一个完整的目录结构是繁琐的,尤其是当项目变得更大时。此外,构建可重用的资产也很困难:框架和库必须“展开”到目录结构中。在 Java 的第二个主要版本(1.2,称为 Java 2)中,设计者添加了 jar 机制,允许归档文件在类路径上充当目录结构。在接下来的十年中,Java 开发者在将类路径准确设置为目录和 JAR 文件的组合方面苦苦挣扎。当然,最初的意图被打破了:现在两个 JAR 文件可能在类路径上创建冲突的名称,导致无数关于调试类加载器的故事。
Measuring Modularity 测量模块化
Given the importance of modularity to architects, they need tools to understand it. Fortunately, researchers created a variety of language-agnostic metrics to help architects understand modularity. We focus on three key concepts: cohesion, coupling, and connascence. 鉴于模块化对架构师的重要性,他们需要工具来理解它。幸运的是,研究人员创建了多种与语言无关的度量标准,以帮助架构师理解模块化。我们关注三个关键概念:内聚性、耦合性和共生性。
Cohesion 内聚性
Cohesion refers to what extent the parts of a module should be contained within the same module. In other words, it is a measure of how related the parts are to one another. Ideally, a cohesive module is one where all the parts should be packaged together, because breaking them into smaller pieces would require coupling the parts together via calls between modules to achieve useful results. 内聚性指的是一个模块的各个部分在多大程度上应该包含在同一个模块内。换句话说,它是衡量各部分之间相关程度的指标。理想情况下,一个内聚的模块是所有部分应该被打包在一起,因为将它们拆分成更小的部分将需要通过模块之间的调用将这些部分耦合在一起,以实现有用的结果。
Attempting to divide a cohesive module would only result in increased coupling and decreased readability. 试图将一个内聚模块拆分只会导致耦合增加和可读性降低。
-Larry Constantine -拉里·康斯坦丁
Computer scientists have defined a range of cohesion measures, listed here from best to worst: 计算机科学家定义了一系列凝聚力度量,按从最好到最差的顺序列出如下:
Functional cohesion 功能内聚性
Every part of the module is related to the other, and the module contains everything essential to function. 模块的每个部分都与其他部分相关,并且模块包含了所有必需的功能。
Sequential cohesion 顺序内聚性
Two modules interact, where one outputs data that becomes the input for the other. 两个模块相互作用,其中一个输出的数据成为另一个的输入。
Communicational cohesion 通信凝聚力
Two modules form a communication chain, where each operates on information and/or contributes to some output. For example, add a record to the database and generate an email based on that information. 两个模块形成一个通信链,每个模块处理信息和/或对某些输出做出贡献。例如,向数据库添加一条记录并根据该信息生成一封电子邮件。
Procedural cohesion 过程内聚性
Two modules must execute code in a particular order. 两个模块必须按特定顺序执行代码。
Temporal cohesion 时间凝聚性
Modules are related based on timing dependencies. For example, many systems have a list of seemingly unrelated things that must be initialized at system startup; these different tasks are temporally cohesive. 模块之间基于时间依赖关系相关。例如,许多系统在系统启动时有一系列看似无关的事物必须初始化;这些不同的任务在时间上是紧密相关的。
Logical cohesion 逻辑凝聚性
The data within modules is related logically but not functionally. For example, consider a module that converts information from text, serialized objects, or streams. Operations are related, but the functions are quite different. A common example of this type of cohesion exists in virtually every Java project in the form of the StringUtils package: a group of static methods that operate on String but are otherwise unrelated. 模块内的数据在逻辑上是相关的,但在功能上并不相关。例如,考虑一个将信息从文本、序列化对象或流转换的模块。操作是相关的,但功能却大相径庭。这种类型的内聚性在几乎每个 Java 项目中都有一个常见的例子,即 StringUtils 包:一组对 String 进行操作的静态方法,但其他方面并无关联。
Coincidental cohesion 偶然内聚
Elements in a module are not related other than being in the same source file; this represents the most negative form of cohesion. 模块中的元素除了在同一个源文件中外没有其他关系;这代表了最消极的内聚形式。
Despite having seven variants listed, cohesion is a less precise metric than coupling. Often, the degree of cohesiveness of a particular module is at the discretion of a particular architect. For example, consider this module definition: 尽管列出了七种变体,但内聚性是一个不如耦合精确的度量。通常,特定模块的内聚程度取决于特定架构师的判断。例如,考虑这个模块定义:
Should the last two entries reside in this module or should the developer create two separate modules, such as: 最后两个条目应该保留在这个模块中,还是开发者应该创建两个单独的模块,例如:
Customer Maintenance 客户维护
add customer 添加客户
update customer 更新客户
get customer 获取客户
notify customer 通知客户
Order Maintenance 订单维护
get customer orders 获取客户订单
cancel customer orders 取消客户订单
Which is the correct structure? As always, it depends: 哪个结构是正确的?一如既往,这要看情况
Are those the only two operations for Order Maintenance? If so, it may make sense to collapse those operations back into Customer Maintenance. 这就是订单维护的唯一两个操作吗?如果是这样,将这些操作合并回客户维护可能是有意义的。
Is Customer Maintenance expected to grow much larger, encouraging developers to look for opportunities to extract behavior? 客户维护预计会大幅增长,鼓励开发人员寻找提取行为的机会吗?
Does Order Maintenance require so much knowledge of Customer information that separating the two modules would require a high degree of coupling to make it functional? This relates back to the Larry Constantine quote. 订单维护是否需要如此多的客户信息知识,以至于将这两个模块分开会需要高度的耦合才能使其功能正常?这与拉里·康斯坦丁的引用有关。
These questions represent the kind of trade-off analysis at the heart of the job of a software architect. 这些问题代表了软件架构师工作核心的权衡分析。
Surprisingly, given the subjectiveness of cohesion, computer scientists have developed a good structural metric to determine cohesion (or, more specifically, the lack of cohesion). A well-known set of metrics named the Chidamber and Kemerer Objectoriented metrics suite was developed by the eponymous authors to measure particular aspects of object-oriented software systems. The suite includes many common code metrics, such as cyclomatic complexity (see “Cyclomatic Complexity” on page 79) and several important coupling metrics discussed in “Coupling” on page 44. 令人惊讶的是,考虑到内聚性的主观性,计算机科学家们开发了一种良好的结构度量来确定内聚性(或更具体地说,缺乏内聚性)。一个著名的度量集名为 Chidamber 和 Kemerer 面向对象度量套件,由同名作者开发,用于测量面向对象软件系统的特定方面。该套件包括许多常见的代码度量,例如圈复杂度(见第 79 页的“圈复杂度”)和在第 44 页的“耦合”中讨论的几个重要耦合度量。
The Chidamber and Kemerer Lack of Cohesion in Methods (LCOM) metric measures the structural cohesion of a module, typically a component. The initial version appears in Equation 3-1. Chidamber 和 Kemerer 方法缺乏内聚性(LCOM)度量模块的结构内聚性,通常是一个组件。初始版本出现在方程 3-1 中。
Equation 3-1. LCOM, version 1 公式 3-1. LCOM,版本 1
LCOM={[|P|-|Q|","," if "|P| > |Q|],[0","," otherwise "]:}L C O M= \begin{cases}|P|-|Q|, & \text { if }|P|>|Q| \\ 0, & \text { otherwise }\end{cases}
PP increases by one for any method that doesn’t access a particular shared field and QQ decreases by one for methods that do share a particular shared field. The authors sympathize with those who don’t understand this formulation. Worse, it has gradually gotten more elaborate over time. The second variation introduced in 1996 (thus the name LCOM96B) appears in Equation 3-2. PP 对于任何不访问特定共享字段的方法增加一,而 QQ 对于共享特定共享字段的方法减少一。作者对那些不理解这种表述的人表示同情。更糟糕的是,随着时间的推移,这种表述逐渐变得更加复杂。1996 年引入的第二种变体(因此命名为 LCOM96B)出现在方程 3-2 中。
Equation 3-2. LCOM 96 b96 b 方程 3-2. LCOM 96 b96 b
LCOM 96 b=(1)/(a)sum_(j=1)^(a)(m-mu(Aj))/(m)L C O M 96 b=\frac{1}{a} \sum_{j=1}^{a} \frac{m-\mu(A j)}{m}
We wont bother untangling the variables and operators in Equation 3-2 because the following written explanation is clearer. Basically, the LCOM metric exposes incidental coupling within classes. Here’s a better definition of LCOM: 我们不会费心去理清方程 3-2 中的变量和运算符,因为下面的书面解释更清晰。基本上,LCOM 指标揭示了类之间的偶然耦合。以下是对 LCOM 的更好定义:
LCOM
The sum of sets of methods not shared via sharing fields 未通过共享字段共享的方法集的总和
Consider a class with private fields a and b. Many of the methods only access a, and many other methods only access b. The sum of the sets of methods not shared via sharing fields ( aa and bb ) is high; therefore, this class reports a high LCOM score, indicating that it scores high in lack of cohesion in methods. Consider the three classes shown in Figure 3-1. 考虑一个具有私有字段 a 和 b 的类。许多方法仅访问 a,许多其他方法仅访问 b。通过共享字段( aa 和 bb )未共享的方法集合的总和很高;因此,这个类报告了一个高的 LCOM 分数,表明它在方法的凝聚力方面得分很高。考虑图 3-1 中显示的三个类。
Figure 3-1. Illustration of the LCOM metric, where fields are octagons and methods are squares 图 3-1. LCOM 指标的示意图,其中字段为八边形,方法为正方形
In Figure 3-1, fields appear as single letters and methods appear as blocks. In Class X , the LCOM score is low, indicating good structural cohesion. Class Y , however, lacks cohesion; each of the field/method pairs in Class YY could appear in its own class without affecting behavior. Class ZZ shows mixed cohesion, where developers could refactor the last field/method combination into its own class. 在图 3-1 中,字段以单个字母的形式出现,方法以块的形式出现。在类 X 中,LCOM 得分较低,表明结构凝聚力良好。然而,类 Y 缺乏凝聚力;类 YY 中的每个字段/方法对都可以出现在自己的类中,而不会影响行为。类 ZZ 显示出混合凝聚力,开发人员可以将最后一个字段/方法组合重构为自己的类。
The LCOM metric is useful to architects who are analyzing code bases in order to move from one architectural style to another. One of the common headaches when moving architectures are shared utility classes. Using the LCOM metric can help architects find classes that are incidentally coupled and should never have been a single class to begin with. LCOM 指标对那些分析代码库以便从一种架构风格转向另一种架构风格的架构师非常有用。在迁移架构时,一个常见的头痛问题是共享的工具类。使用 LCOM 指标可以帮助架构师找到那些偶然耦合的类,这些类本来就不应该是一个单一的类。
Many software metrics have serious deficiencies, and LCOM is not immune. All this metric can find is structural lack of cohesion; it has no way to determine logically if particular pieces fit together. This reflects back on our Second Law of Software Architecture: prefer why over how. 许多软件度量存在严重缺陷,而 LCOM 也不例外。这个度量只能发现结构上的缺乏内聚性;它无法逻辑上判断特定部分是否适合在一起。这反映了我们软件架构的第二法则:更倾向于“为什么”而不是“如何”。
Coupling 耦合
Fortunately, we have better tools to analyze coupling in code bases, based in part on graph theory: because the method calls and returns form a call graph, analysis based on mathematics becomes possible. In 1979, Edward Yourdon and Larry Constantine published Structured Design: Fundamentals of a Discipline of Computer Program and Systems Design (Prentice-Hall), defining many core concepts, including the metrics afferent and efferent coupling. Afferent coupling measures the number of incoming connections to a code artifact (component, class, function, and so on). Efferent coupling measures the outgoing connections to other code artifacts. For virtually every platform tools exist that allow architects to analyze the coupling characteristics of code in order to assist in restructuring, migrating, or understanding a code base. 幸运的是,我们有更好的工具来分析代码库中的耦合,这部分基于图论:因为方法调用和返回形成了一个调用图,因此基于数学的分析成为可能。1979 年,Edward Yourdon 和 Larry Constantine 出版了《结构化设计:计算机程序和系统设计学科的基础》(Prentice-Hall),定义了许多核心概念,包括输入耦合和输出耦合的度量。输入耦合衡量代码工件(组件、类、函数等)到来的连接数量。输出耦合衡量与其他代码工件的外发连接。几乎每个平台都有工具,允许架构师分析代码的耦合特性,以帮助重构、迁移或理解代码库。
Why Such Similar Names for Coupling Metrics? 为什么耦合度量有如此相似的名称?
Why are two critical metrics in the architecture world that represent opposite concepts named virtually the same thing, differing in only the vowels that sound the most alike? These terms originate from Yourdon and Constantine’s Structured Design. Borrowing concepts from mathematics, they coined the now-common afferent and efferent coupling terms, which should have been called incoming and outgoing coupling. However, because the original authors leaned toward mathematical symmetry rather than clarity, developers came up with several mnemonics to help out: a appears before ee in the English alphabet, corresponding to incoming being before outgoing, or the observation that the letter ee in efferent matches the initial letter in exit, corresponding to outgoing connections. 为什么在架构领域中,两个代表相反概念的关键指标几乎被命名为相同的东西,仅在发音上最相似的元音上有所不同?这些术语源自 Yourdon 和 Constantine 的结构化设计。借用数学的概念,他们创造了现在常用的 afferent 和 efferent 耦合术语,这本应被称为 incoming 和 outgoing 耦合。然而,由于原作者更倾向于数学对称而非清晰性,开发人员想出了几个助记符来帮助记忆:a 在英语字母表中出现在 ee 之前,对应于 incoming 在 outgoing 之前,或者观察到 efferent 中的字母 ee 与 exit 中的首字母相匹配,对应于 outgoing 连接。
Abstractness, Instability, and Distance from the Main Sequence 抽象性、不稳定性和与主序的距离
While the raw value of component coupling has value to architects, several other derived metrics allow a deeper evaluation. These metrics were created by Robert Martin for a C++ book, but are widely applicable to other object-oriented languages. 虽然组件耦合的原始值对架构师有价值,但其他几个派生指标允许更深入的评估。这些指标是由罗伯特·马丁为一本 C++书籍创建的,但广泛适用于其他面向对象的语言。
Abstractness is the ratio of abstract artifacts (abstract classes, interfaces, and so on) to concrete artifacts (implementation). It represents a measure of abstractness versus implementation. For example, consider a code base with no abstractions, just a huge, single function of code (as in a single main() method). The flip side is a code base with too many abstractions, making it difficult for developers to understand how things wire together (for example, it takes developers a while to figure out what to do with an AbstractSingletonProxyFactoryBean). 抽象度是抽象工件(抽象类、接口等)与具体工件(实现)之间的比率。它代表了抽象与实现之间的衡量。例如,考虑一个没有抽象的代码库,仅仅是一个巨大的单一代码函数(如一个单一的 main() 方法)。反过来,一个有太多抽象的代码库会使开发人员难以理解事物是如何连接在一起的(例如,开发人员需要一段时间才能弄清楚如何处理 AbstractSingletonProxyFactoryBean)。
The formula for abstractness appears in Equation 3-3. 抽象度的公式出现在公式 3-3 中。
In the equation, m^(a)m^{a} represents abstract elements (interfaces or abstract classes) with the module, and m^(c)m^{c} represents concrete elements (nonabstract classes). This metric looks for the same criteria. The easiest way to visualize this metric: consider an application with 5,000 lines of code, all in one main() method. The abstractness numerator is 1 , while the denominator is 5,000 , yielding an abstractness of almost 0 . Thus, this metric measures the ratio of abstractions in your code. 在这个方程中, m^(a)m^{a} 代表与模块相关的抽象元素(接口或抽象类),而 m^(c)m^{c} 代表具体元素(非抽象类)。这个度量寻找相同的标准。可视化这个度量的最简单方法是:考虑一个包含 5,000 行代码的应用程序,所有代码都在一个 main() 方法中。抽象度的分子是 1,而分母是 5,000,得出的抽象度几乎为 0。因此,这个度量衡量的是代码中抽象的比例。
Architects calculate abstractness by calculating the ratio of the sum of abstract artifacts to the sum of the concrete ones. 架构师通过计算抽象工件总和与具体工件总和的比率来计算抽象度。
Another derived metric, instability, is defined as the ratio of efferent coupling to the sum of both efferent and afferent coupling, shown in Equation 3-4. 另一个派生指标,不稳定性,定义为输出耦合与输出耦合和输入耦合之和的比率,如公式 3-4 所示。
In the equation, c^(e)c^{e} represents efferent (or outgoing) coupling, and c^(a)c^{a} represents afferent (or incoming) coupling. 在这个方程中, c^(e)c^{e} 代表外向耦合, c^(a)c^{a} 代表内向耦合。
The instability metric determines the volatility of a code base. A code base that exhibits high degrees of instability breaks more easily when changed because of high coupling. For example, if a class calls to many other classes to delegate work, the calling class shows high susceptibility to breakage if one or more of the called methods change. 不稳定性指标决定了代码库的波动性。一个表现出高不稳定性的代码库在更改时更容易出现故障,因为耦合度高。例如,如果一个类调用了许多其他类来委派工作,那么如果一个或多个被调用的方法发生变化,调用类就会表现出较高的故障敏感性。
Distance from the Main Sequence 主序星的距离
One of the few holistic metrics architects have for architectural structure is distance from the main sequence, a derived metric based on instability and abstractness, shown in Equation 3-5. 建筑师为建筑结构提供的少数整体指标之一是与主序列的距离,这是一个基于不稳定性和抽象性的派生指标,如公式 3-5 所示。
Equation 3-5. Distance from the main sequence 方程 3-5. 主序星的距离
D=|A+I-1|D=|A+I-1|
In the equation, A=\mathrm{A}= abstractness and I=\mathrm{I}= instability. 在方程中, A=\mathrm{A}= 抽象性和 I=\mathrm{I}= 不稳定性。
Note that both abstractness and instability are fractions whose results will always fall between 0 and 1 (except in extreme cases of abstractness that wouldn’t be practical). Thus, when graphing the relationship, we see the graph in Figure 3-2. 请注意,抽象性和不稳定性都是结果始终介于 0 和 1 之间的分数(除非在极端的抽象性情况下,这种情况不具实际意义)。因此,当绘制关系图时,我们可以看到图 3-2 中的图形。
Figure 3-2. The main sequence defines the ideal relationship between abstractness and instability 图 3-2. 主要序列定义了抽象性与不稳定性之间的理想关系
The distance metric imagines an ideal relationship between abstractness and instability; classes that fall near this idealized line exhibit a healthy mixture of these two competing concerns. For example, graphing a particular class allows developers to calculate the distance from the main sequence metric, illustrated in Figure 3-3. 距离度量想象了抽象性和不稳定性之间的理想关系;落在这个理想化线附近的类展示了这两种竞争关注点的健康混合。例如,绘制特定类的图形允许开发人员计算与主序列度量的距离,如图 3-3 所示。
Figure 3-3. Normalized distance from the main sequence for a particular class 图 3-3. 特定类别的主序列的归一化距离
In Figure 3-3, developers graph the candidate class, then measure the distance from the idealized line. The closer to the line, the better balanced the class. Classes that fall too far into the upper-righthand corner enter into what architects call the zone of uselessness: code that is too abstract becomes difficult to use. Conversely, code that falls into the lower-lefthand corner enter the zone of pain: code with too much implementation and not enough abstraction becomes brittle and hard to maintain, illustrated in Figure 3-4. 在图 3-3 中,开发人员绘制候选类,然后测量与理想线的距离。越接近这条线,类的平衡性就越好。落入右上角过远的类进入建筑师所称的无用区:过于抽象的代码变得难以使用。相反,落入左下角的代码进入痛苦区:实现过多而抽象不足的代码变得脆弱且难以维护,如图 3-4 所示。
Figure 3-4. Zones of Uselessness and Pain 图 3-4. 无用和痛苦的区域
Tools exist in many platforms to provide these measures, which assist architects when analyzing code bases because of unfamiliarity, migration, or technical debt assessment. 许多平台上存在工具来提供这些度量,这些工具在架构师分析代码库时提供帮助,因为他们可能对代码库不熟悉、进行迁移或评估技术债务。
Limitations of Metrics 指标的局限性
While the industry has a few code-level metrics that provide valuable insight into code bases, our tools are extremely blunt compared to analysis tools from other engineering disciplines. Even metrics derived directly from the structure of code require interpretation. For example, cyclomatic complexity (see “Cyclomatic Complexity” on page 79) measures complexity in code bases but cannot distinguish from essential complexity (because the underlying problem is complex) or accidental complexity (the code is more complex than it should be). Virtually all code-level metrics require interpretation, but it is still useful to establish baselines for critical metrics such as cyclomatic complexity so that architects can assess which type they exhibit. We discuss setting up just such tests in “Governance and Fitness Functions” on page 82. 虽然行业中有一些代码级指标可以提供对代码库的有价值洞察,但我们的工具与其他工程学科的分析工具相比极为粗糙。即使是直接从代码结构中得出的指标也需要解释。例如,圈复杂度(见第 79 页的“圈复杂度”)衡量代码库中的复杂性,但无法区分基本复杂性(因为基础问题复杂)或偶然复杂性(代码比应有的更复杂)。几乎所有代码级指标都需要解释,但建立关键指标(如圈复杂度)的基线仍然是有用的,以便架构师可以评估它们表现出哪种类型。我们在第 82 页的“治理和适应性函数”中讨论了如何设置这样的测试。
Notice that the previously mentioned book by Edward Yourdon and and Larry Constantine (Structured Design: Fundamentals of a Discipline of Computer Program and Systems Design) predates the popularity of object-oriented languages, focusing instead on structured programming constructs, such as functions (not methods). It also defined other types of coupling that we do not cover here because they have been supplanted by connascence. 请注意,前面提到的爱德华·尤尔登和拉里·康斯坦丁的书(《结构化设计:计算机程序和系统设计学科的基础》)早于面向对象语言的流行,反而专注于结构化编程构造,例如函数(而不是方法)。它还定义了其他类型的耦合,但我们在这里不予讨论,因为它们已被共生关系所取代。
Connascence 共生性
In 1996, Meilir Page-Jones published What Every Programmer Should Know About Object-Oriented Design (Dorset House), refining the afferent and efferent coupling metrics and recasting them to object-oriented languages with a concept he named connascence. Here’s how he defined the term: 在 1996 年,Meilir Page-Jones 出版了《What Every Programmer Should Know About Object-Oriented Design》(Dorset House),对输入耦合和输出耦合指标进行了细化,并将其重新构造为面向对象的语言,提出了一个他称之为 connascence 的概念。以下是他对该术语的定义:
Two components are connascent if a change in one would require the other to be modified in order to maintain the overall correctness of the system. 如果一个组件的变化需要另一个组件进行修改以保持系统的整体正确性,则这两个组件是共生的。
—Meilir Page-Jones
He developed two types of connascence: static and dynamic. 他开发了两种类型的共生性:静态和动态。
Static connascence 静态共生性
Static connascence refers to source-code-level coupling (as opposed to execution-time coupling, covered in “Dynamic connascence” on page 50); it is a refinement of the afferent and efferent couplings defined by Structured Design. In other words, architects view the following types of static connascence as the degree to which something is coupled, either afferently or efferently: 静态连接性指的是源代码级别的耦合(与第 50 页“动态连接性”中讨论的执行时间耦合相对);它是结构化设计中定义的输入和输出耦合的细化。换句话说,架构师将以下类型的静态连接性视为某种事物的耦合程度,无论是输入还是输出:
Connascence of Name (CoN) 名称的共生性 (CoN)
Multiple components must agree on the name of an entity. 多个组件必须就实体的名称达成一致。
Names of methods represents the most common way that code bases are coupled and the most desirable, especially in light of modern refactoring tools that make system-wide name changes trivial. 方法的名称代表了代码库耦合的最常见方式,也是最理想的,特别是在现代重构工具使得系统范围内的名称更改变得微不足道的情况下。
Connascence of Type (CoT) 类型的共生性 (CoT)
Multiple components must agree on the type of an entity. 多个组件必须就实体的类型达成一致。
This type of connascence refers to the common facility in many statically typed languages to limit variables and parameters to specific types. However, this capability isn’t purely a language feature-some dynamically typed languages offer selective typing, notably Clojure and Clojure Spec. 这种类型的共生关系指的是许多静态类型语言中限制变量和参数为特定类型的共同特性。然而,这种能力并不仅仅是语言特性——一些动态类型语言提供了选择性类型,特别是 Clojure 和 Clojure Spec。
Connascence of Meaning (CoM) or Connascence of Convention (CoC) Multiple components must agree on the meaning of particular values. 意义的共生(CoM)或约定的共生(CoC)多个组件必须对特定值的含义达成一致。
The most common obvious case for this type of connascence in code bases is hard-coded numbers rather than constants. For example, it is common in some languages to consider defining somewhere int TRUE =1=1; int FALSE =0=0. Imagine the problems if someone flips those values. 这种类型的共生在代码库中最常见的明显案例是硬编码的数字而不是常量。例如,在某些语言中,通常会考虑在某处定义 int TRUE =1=1 ; int FALSE =0=0 。想象一下,如果有人翻转这些值,会出现什么问题。
Connascence of Position (CoP) 位置的共生性 (CoP)
Multiple components must agree on the order of values. 多个组件必须就值的顺序达成一致。
This is an issue with parameter values for method and function calls even in languages that feature static typing. For example, if a developer creates a method void updateSeat(String name, String seatLocation) and calls it with the values updateSeat(“14D”, "Ford, NN "), the semantics aren’t correct even if the types are. 这是一个关于方法和函数调用参数值的问题,即使在具有静态类型的语言中也是如此。例如,如果开发者创建了一个方法 void updateSeat(String name, String seatLocation) 并使用值 updateSeat(“14D”, "Ford, NN ") 调用它,即使类型正确,语义也不正确。
Connascence of Algorithm (CoA) 算法的共生性 (CoA)
Multiple components must agree on a particular algorithm. 多个组件必须就特定算法达成一致。
A common case for this type of connascence occurs when a developer defines a security hashing algorithm that must run on both the server and client and produce identical results to authenticate the user. Obviously, this represents a high form of coupling-if either algorithm changes any details, the handshake will no longer work. 这种类型的共生关系的一个常见情况是,当开发者定义一个必须在服务器和客户端上运行并产生相同结果以验证用户的安全哈希算法时。显然,这代表了一种高度的耦合——如果任一算法更改了任何细节,握手将不再有效。
Dynamic connascence 动态共生性
The other type of connascence Page-Jones defined was dynamic connascence, which analyzes calls at runtime. The following is a description of the different types of dynamic connascence: Page-Jones 定义的另一种共生性是动态共生性,它在运行时分析调用。以下是不同类型动态共生性的描述:
Connascence of Execution (CoE) 执行的共生性 (CoE)
The order of execution of multiple components is important. 多个组件的执行顺序很重要。
Consider this code: 考虑这段代码:
email = new Email();
email.setRecipient("foo@example.com");
email.setSender("me@me.com");
email.send();
email.setSubject("whoops");
It won’t work correctly because certain properties must be set in order. 它将无法正确工作,因为某些属性必须按顺序设置。
Connascence of Timing (CoT) 时序共生 (CoT)
The timing of the execution of multiple components is important. 多个组件的执行时机是重要的。
The common case for this type of connascence is a race condition caused by two threads executing at the same time, affecting the outcome of the joint operation. 这种类型的共生关系的常见情况是由于两个线程同时执行而导致的竞争条件,影响联合操作的结果。
Connascence of Values (CoV) 值的共生性 (CoV)
Occurs when several values relate on one another and must change together. 当多个值相互关联并且必须一起改变时,就会发生这种情况。
Consider the case where a developer has defined a rectangle as four points, representing the corners. To maintain the integrity of the data structure, the developer cannot randomly change one of points without considering the impact on the other points. 考虑一个开发者将矩形定义为四个点,代表角落的情况。为了维护数据结构的完整性,开发者不能随意更改其中一个点,而不考虑对其他点的影响。
The more common and problematic case involves transactions, especially in distributed systems. When an architect designs a system with separate databases, yet needs to update a single value across all of the databases, all the values must change together or not at all. 更常见且有问题的情况涉及事务,特别是在分布式系统中。当架构师设计一个具有独立数据库的系统时,如果需要在所有数据库中更新一个单一的值,则所有值必须一起更改,或者都不更改。
Connascence of Identity (CoI) 身份的共生性 (CoI)
Occurs when multiple components must reference the same entity. 当多个组件必须引用同一实体时发生。
The common example of this type of connascence involves two independent components that must share and update a common data structure, such as a distributed queue. 这种类型的共生关系的常见例子涉及两个独立的组件,它们必须共享和更新一个公共数据结构,例如分布式队列。
Architects have a harder time determining dynamic connascence because we lack tools to analyze runtime calls as effectively as we can analyze the call graph. 架构师在确定动态共生性方面面临更大的困难,因为我们缺乏工具来像分析调用图那样有效地分析运行时调用。
Connascence properties 连接性属性
Connascence is an analysis tool for architect and developers, and some properties of connascence help developers use it wisely. The following is a description of each of these connascence properties: Connascence 是一个供架构师和开发人员使用的分析工具,connascence 的一些属性帮助开发人员明智地使用它。以下是对这些 connascence 属性的描述:
Strength 力量
Architects determine the strength of connascence by the ease with which a developer can refactor that type of coupling; different types of connascence are demonstrably more desirable, as shown in Figure 3-5. Architects and developers can improve the coupling characteristics of their code base by refactoring toward better types of connascence. 架构师通过开发者重构这种耦合的难易程度来确定共生性的强度;不同类型的共生性显然更为可取,如图 3-5 所示。架构师和开发者可以通过重构以更好的共生性类型来改善他们代码库的耦合特性。
Architects should prefer static connascence to dynamic because developers can determine it by simple source code analysis, and modern tools make it trivial to improve static connascence. For example, consider the case of connascence of meaning, which developers can improve by refactoring to connascence of name by creating a named constant rather than a magic value. 架构师应该更倾向于静态共生而非动态共生,因为开发人员可以通过简单的源代码分析来确定它,而现代工具使得改善静态共生变得微不足道。例如,考虑意义的共生的情况,开发人员可以通过重构为名称的共生来改善它,方法是创建一个命名常量而不是一个魔法值。
Figure 3-5. The strength on connascence provides a good refactoring guide 图 3-5. 连接性强度提供了良好的重构指南
Locality 局部性
The locality of connascence measures how proximal the modules are to each other in the code base. Proximal code (in the same module) typically has more and higher forms of connascence than more separated code (in separate modules or code bases). In other words, forms of connascence that indicate poor coupling 共生性的局部性衡量了代码库中模块之间的接近程度。接近的代码(在同一模块中)通常具有比更分离的代码(在不同模块或代码库中)更多和更高形式的共生性。换句话说,表明耦合不良的共生性形式
when far apart are fine when closer together. For example, if two classes in the same component have connascence of meaning, it is less damaging to the code base than if two components have the same form of connascence. 当相距较远时是可以的,当靠得更近时则不然。例如,如果同一组件中的两个类具有意义的共生关系,这对代码库的影响比两个组件具有相同形式的共生关系要小。
Developers must consider strength and locality together. Stronger forms of connascence found within the same module represent less code smell than the same connascence spread apart. 开发人员必须同时考虑强度和局部性。在同一模块内发现的更强形式的共生关系比分散在不同模块的相同共生关系代表更少的代码异味。
Degree 学位
The degree of connascence relates to the size of its impact-does it impact a few classes or many? Lesser degrees of connascence damage code bases less. In other words, having high dynamic connascence isn’t terrible if you only have a few modules. However, code bases tend to grow, making a small problem correspondingly bigger. 共生度的程度与其影响的大小有关——它是影响少数类还是许多类?较低的共生度对代码库的损害较小。换句话说,如果你只有少数模块,高动态共生度并不是很糟糕。然而,代码库往往会增长,使得一个小问题相应地变得更大。
Page-Jones offers three guidelines for using connascence to improve systems modularity: Page-Jones 提出了三个使用共生性来改善系统模块化的指导原则:
Minimize overall connascence by breaking the system into encapsulated elements 通过将系统分解为封装元素来最小化整体连接性
Minimize any remaining connascence that crosses encapsulation boundaries 最小化任何跨越封装边界的剩余共生性
Maximize the connascence within encapsulation boundaries 在封装边界内最大化共生性
The legendary software architecture innovator Jim Weirich repopularized the concept of connascence and offers two great pieces of advice: 传奇的软件架构创新者吉姆·韦里奇重新普及了共生性(connascence)的概念,并提供了两条很好的建议:
Rule of Degree: convert strong forms of connascence into weaker forms of connascence 度规则:将强形式的共生关系转换为弱形式的共生关系
Rule of Locality: as the distance between software elements increases, use weaker forms of connascence 局部性规则:随着软件元素之间的距离增加,使用较弱的共生形式
Unifying Coupling and Connascence Metrics 统一耦合和共生度度量
So far, we’ve discussed both coupling and connascence, measures from different eras and with different targets. However, from an architect’s point of view, these two views overlap. What Page-Jones identifies as static connascence represents degrees of either incoming or outgoing coupling. Structured programming only cares about in or out, whereas connascence cares about how things are coupled together. 到目前为止,我们讨论了耦合和共生,这些是来自不同时代和具有不同目标的度量。然而,从架构师的角度来看,这两种观点是重叠的。Page-Jones 所识别的静态共生代表了传入或传出耦合的程度。结构化编程只关心输入或输出,而共生则关心事物是如何耦合在一起的。
To help visualize the overlap in concepts, consider Figure 3-6. The structured programming coupling concepts appear on the left, while the connascence characteristics appear on the right. What structured programming called data coupling (method calls), connascence provides advice for how that coupling should manifest. Structured programming didn’t really address the areas covered by dynamic connascence; we encapsulate that concept shortly in “Architectural Quanta and Granularity” on page 92. 为了帮助可视化概念之间的重叠,请考虑图 3-6。结构化编程的耦合概念出现在左侧,而共生特性出现在右侧。结构化编程所称的数据耦合(方法调用),共生提供了关于这种耦合应如何表现的建议。结构化编程并没有真正解决动态共生所涵盖的领域;我们将在第 92 页的“架构量子和粒度”中简要概述该概念。
Figure 3-6. Unifying coupling and connascence 图 3-6. 统一耦合和共生性
The problems with 1990s connascence 1990 年代的共生问题
Several problems exist for architects when applying these useful metrics for analyzing and designing systems. First, these measures look at details at a low level of code, focusing on code quality and hygiene than necessarily architectural structure. Architects tend to care more about how modules are coupled rather than the degree of coupling. For example, an architect cares about synchronous versus asynchronous communication, and doesn’t care so much about how that’s implemented. 在应用这些有用的度量来分析和设计系统时,架构师面临几个问题。首先,这些度量关注的是低级代码的细节,更注重代码质量和卫生,而不一定是架构结构。架构师往往更关心模块之间的耦合方式,而不是耦合的程度。例如,架构师关心的是同步与异步通信,而不太关心其实现方式。
The second problem with connascence lies with the fact that it doesn’t really address a fundamental decision that many modern architects must make-synchronous or asynchronous communication in distributed architectures like microservices? Referring back to the First Law of Software Architecture, everything is a trade-off. After we discuss the scope of architecture characteristics in Chapter 7, we’ll introduce new ways to think about modern connascence. connascence 的第二个问题在于,它并没有真正解决许多现代架构师必须做出的一个基本决策——在像微服务这样的分布式架构中,选择同步还是异步通信?回到软件架构的第一法则,一切都是权衡。在我们讨论第 7 章中架构特性的范围后,我们将介绍关于现代 connascence 的新思维方式。
From Modules to Components 从模块到组件
We use the term module throughout as a generic name for a bundling of related code. However, most platforms support some form of component, one of the key building blocks for software architects. The concept and corresponding analysis of the logical or physical separation has existed since the earliest days of computer science. Yet, with all the writing and thinking about components and separation, developers and architects still struggle with achieving good outcomes. 我们在整个过程中使用“模块”这个术语作为相关代码的通用名称。然而,大多数平台支持某种形式的组件,这是软件架构师的关键构建块之一。逻辑或物理分离的概念及相应的分析自计算机科学的早期阶段就已经存在。然而,尽管有关于组件和分离的众多写作和思考,开发人员和架构师仍然在实现良好结果方面面临挑战。
We’ll discuss deriving components from problem domains in Chapter 8, but we must first discuss another fundamental aspect of software architecture: architecture characteristics and their scope. 我们将在第 8 章讨论从问题领域推导组件,但我们必须首先讨论软件架构的另一个基本方面:架构特征及其范围。
CHAPTER 4 第四章
Architecture Characteristics Defined 架构特性定义
A company decides to solve a particular problem using software, so it gathers a list of requirements for that system. A wide variety of techniques exist for the exercise of requirements gathering, generally defined by the software development process used by the team. But the architect must consider many other factors in designing a software solution, as illustrated in Figure 4-1. 一家公司决定使用软件解决一个特定问题,因此它收集了该系统的需求列表。存在多种技术用于需求收集,通常由团队使用的软件开发过程来定义。但是,架构师在设计软件解决方案时必须考虑许多其他因素,如图 4-1 所示。
Figure 4-1. A software solution consists of both domain requirements and architectural characteristics 图 4-1。软件解决方案由领域需求和架构特征组成。
Architects may collaborate on defining the domain or business requirements, but one key responsibility entails defining, discovering, and otherwise analyzing all the things the software must do that isn’t directly related to the domain functionality: architectural characteristics. 架构师可能会合作定义领域或业务需求,但一个关键责任是定义、发现和分析软件必须执行的所有与领域功能无直接关系的事项:架构特性。
What distinguishes software architecture from coding and design? Many things, including the role that architects have in defining architectural characteristics, the important aspects of the system independent of the problem domain. Many organizations describe these features of software with a variety of terms, including nonfunctional requirements, but we dislike that term because it is self-denigrating. Architects created that term to distinguish architecture characteristics from functional requirements, but naming something nonfunctional has a negative impact from a language standpoint: how can teams be convinced to pay enough attention to something “nonfunctional”? Another popular term is quality attributes, which we dislike because it 软件架构与编码和设计有什么区别?有很多方面,包括架构师在定义架构特性方面的角色,以及与问题领域无关的系统重要方面。许多组织用各种术语来描述软件的这些特性,包括非功能性需求,但我们不喜欢这个术语,因为它自贬。架构师创造了这个术语,以区分架构特性和功能性需求,但从语言的角度来看,称某些东西为非功能性会产生负面影响:团队如何能被说服去足够关注“非功能性”的东西?另一个流行的术语是质量属性,我们也不喜欢这个术语,因为它
implies after-the-fact quality assessment rather than design. We prefer architecture characteristics because it describes concerns critical to the success of the architecture, and therefore the system as a whole, without discounting its importance. 这意味着事后质量评估而不是设计。我们更喜欢架构特性,因为它描述了对架构的成功至关重要的关注点,因此也对整个系统至关重要,而不忽视其重要性。
An architecture characteristic meets three criteria: 一个架构特性满足三个标准:
Specifies a nondomain design consideration 指定一个非领域设计考虑因素
Influences some structural aspect of the design 影响设计的某些结构方面
Is critical or important to application success 对应用成功至关重要或重要
These interlocking parts of our definition are illustrated in Figure 4-2. 我们定义的这些相互关联的部分在图 4-2 中进行了说明。
Figure 4-2. The differentiating features of architecture characteristics 图 4-2. 架构特征的差异化特征
The definition illustrated in Figure 4-2 consists of the three components listed, in addition to a few modifiers: 图 4-2 中所示的定义由列出的三个组件以及一些修饰符组成:
Specifies a nondomain design consideration 指定一个非领域设计考虑因素
When designing an application, the requirements specify what the application should do; architecture characteristics specify operational and design criteria for success, concerning how to implement the requirements and why certain choices were made. For example, a common important architecture characteristic specifies a certain level of performance for the application, which often doesn’t appear in a requirements document. Even more pertinent: no requirements document states “prevent technical debt,” but it is a common design consideration for architects and developers. We cover this distinction between explicit and implicit characteristics in depth in “Extracting Architecture Characteristics from Domain Concerns” on page 65. 在设计应用程序时,需求指定了应用程序应该做什么;架构特性指定了成功的操作和设计标准,涉及如何实现需求以及为什么做出某些选择。例如,一个常见的重要架构特性指定了应用程序的某个性能水平,这通常不会出现在需求文档中。更相关的是:没有需求文档会说明“防止技术债务”,但这是架构师和开发人员常见的设计考虑。我们在第 65 页的“从领域关注中提取架构特性”中深入讨论了显性和隐性特性之间的区别。
Influences some structural aspect of the design 影响设计的某些结构方面
The primary reason architects try to describe architecture characteristics on projects concerns design considerations: does this architecture characteristic require special structural consideration to succeed? For example, security is a concern in virtually every project, and all systems must take a baseline of precautions during design and coding. However, it rises to the level of architecture characteristic when the architect needs to design something special. Consider two cases surrounding payment in a example system: 建筑师试图在项目中描述架构特征的主要原因与设计考虑有关:这个架构特征是否需要特别的结构考虑才能成功?例如,安全性几乎是每个项目中的一个关注点,所有系统在设计和编码时都必须采取基本的预防措施。然而,当建筑师需要设计一些特别的东西时,它就上升到架构特征的层面。考虑一个示例系统中与支付相关的两个案例:
Third-party payment processor 第三方支付处理器
If an integration point handles payment details, then the architecture shouldn’t require special structural considerations. The design should incorporate standard security hygiene, such as encryption and hashing, but doesn’t require special structure. 如果一个集成点处理支付细节,那么架构不应该需要特殊的结构考虑。设计应该包含标准的安全卫生措施,例如加密和哈希,但不需要特殊的结构。
In-application payment processing 应用内支付处理
If the application under design must handle payment processing, the architect may design a specific module, component, or service for that purpose to isolate the critical security concerns structurally. Now, the architecture characteristic has an impact on both architecture and design. 如果正在设计的应用程序必须处理支付处理,架构师可以为此设计一个特定的模块、组件或服务,以在结构上隔离关键的安全问题。现在,架构特性对架构和设计都有影响。
Of course, even these two criteria aren’t sufficient in many cases to make this determination: past security incidents, the nature of the integration with the third party, and a host of other criteria may be present during this decision. Still, it shows some of the considerations architects must make when determining how to design for certain capabilities. 当然,即使这两个标准在许多情况下也不足以做出这种判断:过去的安全事件、与第三方的集成性质以及其他许多标准可能在这个决策过程中出现。尽管如此,它展示了架构师在确定如何设计某些功能时必须考虑的一些因素。
Critical or important to application success 对应用成功至关重要或重要
Applications could support a huge number of architecture characteristics…but shouldn’t. Support for each architecture characteristic adds complexity to the design. Thus, a critical job for architects lies in choosing the fewest architecture characteristics rather than the most possible. 应用程序可以支持大量的架构特性……但不应该。对每个架构特性的支持会增加设计的复杂性。因此,架构师的一项关键工作在于选择最少的架构特性,而不是尽可能多的特性。
We further subdivide architecture characteristics into implicit versus explicit architecture characteristics. Implicit ones rarely appear in requirements, yet they’re necessary for project success. For example, availability, reliability, and security underpin virtually all applications, yet they’re rarely specified in design documents. Architects must use their knowledge of the problem domain to uncover these architecture characteristics during the analysis phase. For example, a high-frequency trading firm may not have to specify low latency in every system, yet the architects in that problem domain know how critical it is. Explicit architecture characteristics appear in requirements documents or other specific instructions. 我们进一步将架构特性细分为隐式和显式架构特性。隐式特性很少出现在需求中,但它们对项目成功是必要的。例如,可用性、可靠性和安全性几乎支撑着所有应用程序,但它们在设计文档中很少被指定。架构师必须利用他们对问题领域的知识,在分析阶段揭示这些架构特性。例如,一家高频交易公司可能不需要在每个系统中指定低延迟,但该问题领域的架构师知道这有多重要。显式架构特性出现在需求文档或其他具体指示中。
In Figure 4-2, the choice of a triangle is intentional: each of the definition elements supports the others, which in turn support the overall design of the system. The ful- 在图 4-2 中,选择三角形是有意的:每个定义元素相互支持,而这些元素又支持系统的整体设计。
crum created by the triangle illustrates the fact that these architecture characteristics often interact with one another, leading to the pervasive use among architects of the term trade-off. 三角形创建的曲线说明了这些架构特性之间经常相互作用的事实,这导致建筑师普遍使用“权衡”这个术语。
Architecture characteristics exist along a broad spectrum of the software system, ranging from low-level code characteristics, such as modularity, to sophisticated operational concerns, such as scalability and elasticity. No true universal standard exists despite attempts to codify ones in the past. Instead, each organization creates its own interpretation of these terms. Additionally, because the software ecosystem changes so fast, new concepts, terms, measures, and verifications constantly appear, providing new opportunities for architecture characteristics definitions. 架构特性存在于软件系统的广泛范围内,从低级代码特性(如模块化)到复杂的操作问题(如可扩展性和弹性)。尽管过去曾尝试对其进行规范化,但并不存在真正的通用标准。相反,每个组织都会对这些术语进行自己的解释。此外,由于软件生态系统变化如此之快,新的概念、术语、度量和验证不断出现,为架构特性的定义提供了新的机会。
Despite the volume and scale, architects commonly separate architecture characteristics into broad categories. The following sections describe a few, along with some examples. 尽管数量和规模庞大,架构师通常将架构特性分为广泛的类别。以下部分描述了一些类别及其示例。
Operational Architecture Characteristics 操作架构特征
Operational architecture characteristics cover capabilities such as performance, scalability, elasticity, availability, and reliability. Table 4-1 lists some operational architecture characteristics. 操作架构特性涵盖了性能、可扩展性、弹性、可用性和可靠性等能力。表 4-1 列出了一些操作架构特性。
Table 4-1. Common operational architecture characteristics 表 4-1. 常见的操作架构特征
How long the system will need to be available (if 24//724 / 7, steps need to be in place to allow the system to be
up and running quickly in case of any failure).
How long the system will need to be available (if 24//7, steps need to be in place to allow the system to be
up and running quickly in case of any failure).| How long the system will need to be available (if $24 / 7$, steps need to be in place to allow the system to be |
| :--- |
| up and running quickly in case of any failure). |
Includes stress testing, peak analysis, analysis of the frequency of functions used, capacity required, and
response times. Performance acceptance sometimes requires an exercise of its own, taking months to
complete.
Disaster recovery capability.
Includes stress testing, peak analysis, analysis of the frequency of functions used, capacity required, and
response times. Performance acceptance sometimes requires an exercise of its own, taking months to
complete.| Disaster recovery capability. |
| :--- |
| Includes stress testing, peak analysis, analysis of the frequency of functions used, capacity required, and |
| response times. Performance acceptance sometimes requires an exercise of its own, taking months to |
| complete. |
Recoverability 可恢复性
业务连续性要求(例如,在灾难发生时,系统需要多快重新上线?)。这将影响备份策略和对重复硬件的要求。
Business continuity requirements (e.g., in case of a disaster, how quickly is the system required to be on-
line again?). This will affect the backup strategy and requirements for duplicated hardware.
Business continuity requirements (e.g., in case of a disaster, how quickly is the system required to be on-
line again?). This will affect the backup strategy and requirements for duplicated hardware.| Business continuity requirements (e.g., in case of a disaster, how quickly is the system required to be on- |
| :--- |
| line again?). This will affect the backup strategy and requirements for duplicated hardware. |
Assess if the system needs to be fail-safe, or if it is mission critical in a way that affects lives. If it fails, will
it cost the company large sums of money?
Assess if the system needs to be fail-safe, or if it is mission critical in a way that affects lives. If it fails, will
it cost the company large sums of money?| Assess if the system needs to be fail-safe, or if it is mission critical in a way that affects lives. If it fails, will |
| :--- |
| it cost the company large sums of money? |
safety 安全
在运行时处理错误和边界条件的能力,例如当互联网连接中断或发生停电或硬件故障时。
Ability to handle error and boundary conditions while running if the internet connection goes down or if
there's a power outage or hardware failure.
Ability to handle error and boundary conditions while running if the internet connection goes down or if
there's a power outage or hardware failure.| Ability to handle error and boundary conditions while running if the internet connection goes down or if |
| :--- |
| there's a power outage or hardware failure. |
Scalability 可扩展性
Ability for the system to perform and operate as the number of users or requests increases. 系统在用户或请求数量增加时能够执行和操作的能力。
Ability for the system to perform and operate as the number of users or requests increases.| Ability for the system to perform and operate as the number of users or requests increases. |
| :--- |
Term Definition
Availability "How long the system will need to be available (if 24//7, steps need to be in place to allow the system to be
up and running quickly in case of any failure)."
Continuity "Disaster recovery capability.
Includes stress testing, peak analysis, analysis of the frequency of functions used, capacity required, and
response times. Performance acceptance sometimes requires an exercise of its own, taking months to
complete."
Recoverability "Business continuity requirements (e.g., in case of a disaster, how quickly is the system required to be on-
line again?). This will affect the backup strategy and requirements for duplicated hardware."
Reliability/ "Assess if the system needs to be fail-safe, or if it is mission critical in a way that affects lives. If it fails, will
it cost the company large sums of money?"
safety "Ability to handle error and boundary conditions while running if the internet connection goes down or if
there's a power outage or hardware failure."
Scalability "Ability for the system to perform and operate as the number of users or requests increases."| Term | Definition |
| :--- | :--- |
| Availability | How long the system will need to be available (if $24 / 7$, steps need to be in place to allow the system to be <br> up and running quickly in case of any failure). |
| Continuity | Disaster recovery capability. <br> Includes stress testing, peak analysis, analysis of the frequency of functions used, capacity required, and <br> response times. Performance acceptance sometimes requires an exercise of its own, taking months to <br> complete. |
| Recoverability | Business continuity requirements (e.g., in case of a disaster, how quickly is the system required to be on- <br> line again?). This will affect the backup strategy and requirements for duplicated hardware. |
| Reliability/ | Assess if the system needs to be fail-safe, or if it is mission critical in a way that affects lives. If it fails, will <br> it cost the company large sums of money? |
| safety | Ability to handle error and boundary conditions while running if the internet connection goes down or if <br> there's a power outage or hardware failure. |
| Scalability | Ability for the system to perform and operate as the number of users or requests increases. |
Operational architecture characteristics heavily overlap with operations and DevOps concerns, forming the intersection of those concerns in many software projects. 操作架构特征与运营和 DevOps 关注点高度重叠,在许多软件项目中形成了这些关注点的交集。
Structural Architecture Characteristics 结构架构特征
Architects must concern themselves with code structure. In many cases, the architect has sole or shared responsibility for code quality concerns, such as good modularity, controlled coupling between components, readable code, and a host of other internal quality assessments. Table 4-2 lists a few structural architecture characteristics. 架构师必须关注代码结构。在许多情况下,架构师对代码质量问题负有单独或共同的责任,例如良好的模块化、组件之间的受控耦合、可读的代码以及其他许多内部质量评估。表 4-2 列出了一些结构架构特征。
Ability for the end users to easily change aspects of the software's configuration (through usable
interfaces).
Ability for the end users to easily change aspects of the software's configuration (through usable
interfaces).| Ability for the end users to easily change aspects of the software's configuration (through usable |
| :--- |
| interfaces). |
Extensibility 可扩展性
How important it is to plug new pieces of functionality in. 将新功能模块接入是多么重要。
Installability 可安装性
Ease of system installation on all necessary platforms. 在所有必要平台上简化系统安装。
Ability to leverage common components across multiple products. 能够在多个产品中利用通用组件。
Ability to leverage common components across multiple products.| Ability to leverage common components across multiple products. |
| :--- |
Localization 本地化
在数据字段的输入/查询屏幕上支持多种语言;在报告中,支持多字节字符要求和计量单位或货币。
Support for multiple languages on entry/query screens in data fields; on reports, multibyte character
requirements and units of measure or currencies.
Support for multiple languages on entry/query screens in data fields; on reports, multibyte character
requirements and units of measure or currencies.| Support for multiple languages on entry/query screens in data fields; on reports, multibyte character |
| :--- |
| requirements and units of measure or currencies. |
Maintainability 可维护性
How easy it is to apply changes and enhance the system? 应用更改和增强系统有多容易?
How easy it is to apply changes and enhance the system?| How easy it is to apply changes and enhance the system? |
| :--- |
Portability 可移植性
系统是否需要在多个平台上运行?(例如,前端是否需要同时在 Oracle 和 SAP DB 上运行?应用程序需要什么级别的技术支持?调试系统错误需要什么级别的日志记录和其他功能?)
Does the system need to run on more than one platform? (For example, does the frontend need to run
against Oracle as well as SAP DB?
What level of technical support is needed by the application? What level of logging and other facilities
are required to debug errors in the system?
Does the system need to run on more than one platform? (For example, does the frontend need to run
against Oracle as well as SAP DB?
What level of technical support is needed by the application? What level of logging and other facilities
are required to debug errors in the system?| Does the system need to run on more than one platform? (For example, does the frontend need to run |
| :--- |
| against Oracle as well as SAP DB? |
| What level of technical support is needed by the application? What level of logging and other facilities |
| are required to debug errors in the system? |
Supportability 可支持性
能够轻松/快速地从该应用程序/解决方案的先前版本升级到服务器和客户端上的新版本。
Ability to easily/quickly upgrade from a previous version of this application/solution to a newer version
on servers and clients.
Ability to easily/quickly upgrade from a previous version of this application/solution to a newer version
on servers and clients.| Ability to easily/quickly upgrade from a previous version of this application/solution to a newer version |
| :--- |
| on servers and clients. |
Upgradeability 可升级性
Term Definition
Configurability "Ability for the end users to easily change aspects of the software's configuration (through usable
interfaces)."
Extensibility How important it is to plug new pieces of functionality in.
Installability Ease of system installation on all necessary platforms.
"Leverageability/
reuse" "Ability to leverage common components across multiple products."
Localization "Support for multiple languages on entry/query screens in data fields; on reports, multibyte character
requirements and units of measure or currencies."
Maintainability "How easy it is to apply changes and enhance the system?"
Portability "Does the system need to run on more than one platform? (For example, does the frontend need to run
against Oracle as well as SAP DB?
What level of technical support is needed by the application? What level of logging and other facilities
are required to debug errors in the system?"
Supportability "Ability to easily/quickly upgrade from a previous version of this application/solution to a newer version
on servers and clients."
Upgradeability | Term | Definition |
| :--- | :--- |
| Configurability | Ability for the end users to easily change aspects of the software's configuration (through usable <br> interfaces). |
| Extensibility | How important it is to plug new pieces of functionality in. |
| Installability | Ease of system installation on all necessary platforms. |
| Leverageability/ <br> reuse | Ability to leverage common components across multiple products. |
| Localization | Support for multiple languages on entry/query screens in data fields; on reports, multibyte character <br> requirements and units of measure or currencies. |
| Maintainability | How easy it is to apply changes and enhance the system? |
| Portability | Does the system need to run on more than one platform? (For example, does the frontend need to run <br> against Oracle as well as SAP DB? <br> What level of technical support is needed by the application? What level of logging and other facilities <br> are required to debug errors in the system? |
| Supportability | Ability to easily/quickly upgrade from a previous version of this application/solution to a newer version <br> on servers and clients. |
| Upgradeability | |
Cross-Cutting Architecture Characteristics 横切架构特征
While many architecture characteristics fall into easily recognizable categories, many fall outside or defy categorization yet form important design constraints and considerations. Table 4-3 describes a few of these. 虽然许多架构特征可以归入易于识别的类别,但许多特征则超出了这些类别或难以分类,但仍然构成重要的设计约束和考虑因素。表 4-3 描述了其中的一些。
Access to all your users, including those with disabilities like colorblindness or hearing loss.
Archivability
Access to all your users, including those with disabilities like colorblindness or hearing loss.
Archivability| Access to all your users, including those with disabilities like colorblindness or hearing loss. |
| :--- |
| Archivability |
Will the data need to be archived or deleted after a period of time? (For example, customer accounts are
to be deleted after three months or marked as obsolete and archived to a secondary database for future
access.)
Will the data need to be archived or deleted after a period of time? (For example, customer accounts are
to be deleted after three months or marked as obsolete and archived to a secondary database for future
access.)| Will the data need to be archived or deleted after a period of time? (For example, customer accounts are |
| :--- |
| to be deleted after three months or marked as obsolete and archived to a secondary database for future |
| access.) |
Security requirements to ensure users are who they say they are. 确保用户身份的安全要求。
Security requirements to ensure users are who they say they are.| Security requirements to ensure users are who they say they are. |
| :--- |
Authorization 授权
安全要求以确保用户只能访问应用程序中的某些功能(按用例、子系统、网页、业务规则、字段级别等)。
Security requirements to ensure users can access only certain functions within the application (by use case,
subsystem, webpage, business rule, field level, etc.).
Security requirements to ensure users can access only certain functions within the application (by use case,
subsystem, webpage, business rule, field level, etc.).| Security requirements to ensure users can access only certain functions within the application (by use case, |
| :--- |
| subsystem, webpage, business rule, field level, etc.). |
What legislative constraints is the system operating in (data protection, Sarbanes 0xley, GDPR, etc.)? What
reservation rights does the company require? Any regulations regarding the way the application is to be
built or deployed?
What legislative constraints is the system operating in (data protection, Sarbanes 0xley, GDPR, etc.)? What
reservation rights does the company require? Any regulations regarding the way the application is to be
built or deployed?| What legislative constraints is the system operating in (data protection, Sarbanes 0xley, GDPR, etc.)? What |
| :--- |
| reservation rights does the company require? Any regulations regarding the way the application is to be |
| built or deployed? |
Term Definition
Accessibility "Access to all your users, including those with disabilities like colorblindness or hearing loss.
Archivability""Will the data need to be archived or deleted after a period of time? (For example, customer accounts are
to be deleted after three months or marked as obsolete and archived to a secondary database for future
access.)"
Authentication "Security requirements to ensure users are who they say they are."
Authorization "Security requirements to ensure users can access only certain functions within the application (by use case,
subsystem, webpage, business rule, field level, etc.)."
Legal "What legislative constraints is the system operating in (data protection, Sarbanes 0xley, GDPR, etc.)? What
reservation rights does the company require? Any regulations regarding the way the application is to be
built or deployed?"| Term | Definition |
| :--- | :--- |
| Accessibility | Access to all your users, including those with disabilities like colorblindness or hearing loss. <br> ArchivabilityWill the data need to be archived or deleted after a period of time? (For example, customer accounts are <br> to be deleted after three months or marked as obsolete and archived to a secondary database for future <br> access.) |
| Authentication | Security requirements to ensure users are who they say they are. |
| Authorization | Security requirements to ensure users can access only certain functions within the application (by use case, <br> subsystem, webpage, business rule, field level, etc.). |
| Legal | What legislative constraints is the system operating in (data protection, Sarbanes 0xley, GDPR, etc.)? What <br> reservation rights does the company require? Any regulations regarding the way the application is to be <br> built or deployed? |
Privacy 隐私
能够将交易隐藏于内部公司员工(加密交易,因此即使是数据库管理员和网络架构师也无法看到它们)。
Ability to hide transactions from internal company employees (encrypted transactions so even DBAs and
network architects cannot see them).
Ability to hide transactions from internal company employees (encrypted transactions so even DBAs and
network architects cannot see them).| Ability to hide transactions from internal company employees (encrypted transactions so even DBAs and |
| :--- |
| network architects cannot see them). |
Does the data need to be encrypted in the database? Encrypted for network communication between
internal systems? What type of authentication needs to be in place for remote user access?
Does the data need to be encrypted in the database? Encrypted for network communication between
internal systems? What type of authentication needs to be in place for remote user access?| Does the data need to be encrypted in the database? Encrypted for network communication between |
| :--- |
| internal systems? What type of authentication needs to be in place for remote user access? |
Supportability 可支持性
应用程序需要什么级别的技术支持?系统中调试错误需要什么级别的日志记录和其他设施?
What level of technical support is needed by the application? What level of logging and other facilities are
required to debug errors in the system?
What level of technical support is needed by the application? What level of logging and other facilities are
required to debug errors in the system?| What level of technical support is needed by the application? What level of logging and other facilities are |
| :--- |
| required to debug errors in the system? |
Level of training required for users to achieve their goals with the application/solution. Usability
requirements need to be treated as seriously as any other architectural issue.
Level of training required for users to achieve their goals with the application/solution. Usability
requirements need to be treated as seriously as any other architectural issue.| Level of training required for users to achieve their goals with the application/solution. Usability |
| :--- |
| requirements need to be treated as seriously as any other architectural issue. |
Privacy "Ability to hide transactions from internal company employees (encrypted transactions so even DBAs and
network architects cannot see them)."
Security "Does the data need to be encrypted in the database? Encrypted for network communication between
internal systems? What type of authentication needs to be in place for remote user access?"
Supportability "What level of technical support is needed by the application? What level of logging and other facilities are
required to debug errors in the system?"
"Usability/
achievability" "Level of training required for users to achieve their goals with the application/solution. Usability
requirements need to be treated as seriously as any other architectural issue."| Privacy | Ability to hide transactions from internal company employees (encrypted transactions so even DBAs and <br> network architects cannot see them). |
| :--- | :--- |
| Security | Does the data need to be encrypted in the database? Encrypted for network communication between <br> internal systems? What type of authentication needs to be in place for remote user access? |
| Supportability | What level of technical support is needed by the application? What level of logging and other facilities are <br> required to debug errors in the system? |
| Usability/ <br> achievability | Level of training required for users to achieve their goals with the application/solution. Usability <br> requirements need to be treated as seriously as any other architectural issue. |
Any list of architecture characteristics will necessarily be an incomplete list; any software may invent important architectural characteristics based on unique factors (see “Italy-ility” on page 60 for an example). 任何架构特征的列表必然是不完整的;任何软件都可能基于独特因素发明重要的架构特征(请参见第 60 页的“意大利特性”作为示例)。
Italy-ility 意大利-性
One of Neal’s colleagues recounts a story about the unique nature of architectural characteristics. She worked for a client whose mandate required a centralized architecture. Yet, for each proposed design, the first question from the client was “But what happens if we lose Italy?” Years ago, because of a freak communication outage, the head office had lost communication with the Italian branches, and it was organizationally traumatic. Thus, a firm requirement of all future architectures insisted upon what the team eventually called Italy-ility, which they all knew meant a unique combination of availability, recoverability, and resilience. 尼尔的一位同事讲述了一个关于架构特征独特性的故事。她为一个要求集中式架构的客户工作。然而,对于每个提议的设计,客户的第一个问题是“如果我们失去意大利会怎样?”多年前,由于一次意外的通信中断,总部与意大利分支失去了联系,这在组织上造成了创伤。因此,所有未来架构的一个严格要求是团队最终称之为意大利特性(Italy-ility),他们都知道这意味着可用性、可恢复性和弹性的独特组合。
Additionally, many of the preceding terms are imprecise and ambiguous, sometimes because of subtle nuance or the lack of objective definitions. For example, interoperability and compatibility may appear equivalent, which will be true for some systems. However, they differ because interoperability implies ease of integration with other systems, which in turn implies published, documented APIs. Compatibility, on the other hand, is more concerned with industry and domain standards. Another example is learnability. One definition is how easy it is for users to learn to use the software, and another definition is the level at which the system can automatically learn about its environment in order to become self-configuring or self-optimizing using machine learning algorithms. 此外,许多前述术语不够精确且含糊,有时是由于微妙的细微差别或缺乏客观定义。例如,互操作性和兼容性可能看起来等同,这在某些系统中是正确的。然而,它们的区别在于,互操作性意味着与其他系统集成的便利性,这反过来又意味着发布和记录的 API。另一方面,兼容性更关注行业和领域标准。另一个例子是可学习性。一种定义是用户学习使用软件的难易程度,另一种定义是系统能够自动学习其环境的程度,以便使用机器学习算法实现自我配置或自我优化。
Many of the definitions overlap. For example, consider availability and reliability, which seem to overlap in almost all cases. Yet consider the internet protocol UDP, which underlies TCP. UDP is available over IP but not reliable: the packets may arrive out of order, and the receiver may have to ask for missing packets again. 许多定义是重叠的。例如,考虑可用性和可靠性,这两者在几乎所有情况下似乎都有重叠。然而,考虑一下互联网协议 UDP,它是 TCP 的基础。UDP 在 IP 上是可用的,但不可靠:数据包可能会乱序到达,接收方可能需要再次请求丢失的数据包。
No complete list of standards exists. The International Organization for Standards (ISO) publishes a list organized by capabilities, overlapping many of the ones we’ve 没有完整的标准列表。国际标准化组织(ISO)发布了一个按能力组织的列表,与我们已有的标准有许多重叠。
listed, but mainly establishing an incomplete category list. The following are some of the ISO definitions: 列出,但主要是建立一个不完整的类别列表。以下是一些 ISO 定义:
Performance efficiency 性能效率
Measure of the performance relative to the amount of resources used under known conditions. This includes time behavior (measure of response, processing times, and/or throughput rates), resource utilization (amounts and types of resources used), and capacity (degree to which the maximum established limits are exceeded). 在已知条件下,相对于所使用资源量的性能度量。这包括时间行为(响应、处理时间和/或吞吐率的度量)、资源利用率(使用的资源数量和类型)以及容量(超出既定最大限制的程度)。
Compatibility 兼容性
Degree to which a product, system, or component can exchange information with other products, systems, or components and/or perform its required functions while sharing the same hardware or software environment. It includes coexistence (can perform its required functions efficiently while sharing a common environment and resources with other products) and interoperability (degree to which two or more systems can exchange and utilize information). 产品、系统或组件与其他产品、系统或组件交换信息的能力,以及在共享相同硬件或软件环境的情况下执行其所需功能的能力。它包括共存(能够在与其他产品共享共同环境和资源的情况下高效地执行其所需功能)和互操作性(两个或多个系统交换和利用信息的能力)。
Usability 可用性
Users can use the system effectively, efficiently, and satisfactorily for its intended purpose. It includes appropriateness recognizability (users can recognize whether the software is appropriate for their needs), learnability (how easy users can learn how to use the software), user error protection (protection against users making errors), and accessibility (make the software available to people with the widest range of characteristics and capabilities). 用户可以有效、效率高且令人满意地使用该系统以实现其预期目的。它包括适用性可识别性(用户可以识别软件是否适合他们的需求)、可学习性(用户学习如何使用软件的难易程度)、用户错误保护(防止用户犯错的保护)和可访问性(使软件可供具有最广泛特征和能力的人使用)。
Reliability 可靠性
Degree to which a system functions under specified conditions for a specified period of time. This characteristic includes subcategories such as maturity (does the software meet the reliability needs under normal operation), availability (software is operational and accessible), fault tolerance (does the software operate as intended despite hardware or software faults), and recoverability (can the software recover from failure by recovering any affected data and reestablish the desired state of the system. 系统在特定条件下、特定时间段内运行的程度。该特性包括以下子类别:成熟度(软件在正常操作下是否满足可靠性需求)、可用性(软件是否可操作和可访问)、容错性(软件是否能够在硬件或软件故障的情况下按预期运行)和可恢复性(软件是否能够通过恢复任何受影响的数据并重新建立系统的期望状态来从故障中恢复)。
Security 安全
Degree the software protects information and data so that people or other products or systems have the degree of data access appropriate to their types and levels of authorization. This family of characteristics includes confidentiality (data is accessible only to those authorized to have access), integrity (the software prevents unauthorized access to or modification of software or data), nonrepudiation, (can actions or events be proven to have taken place), accountability (can user actions of a user be traced), and authenticity (proving the identity of a user). 软件保护信息和数据的程度,以便人们或其他产品或系统可以根据其类型和授权级别获得适当的数据访问。这个特性家族包括保密性(数据仅对被授权访问的人可用)、完整性(软件防止未经授权访问或修改软件或数据)、不可否认性(可以证明某些行为或事件发生过)、问责性(用户的行为可以被追踪)和真实性(证明用户的身份)。
Maintainability 可维护性
Represents the degree of effectiveness and efficiency to which developers can modify the software to improve it, correct it, or adapt it to changes in environment and/or requirements. This characteristic includes modularity (degree to which the software is composed of discrete components), reusability (degree to which developers can use an asset in more than one system or in building other assets), analyzability (how easily developers can gather concrete metrics about the software), modifiability (degree to which developers can modify the software without introducing defects or degrading existing product quality), and testability (how easily developers and others can test the software). 表示开发人员能够修改软件以改善、修正或适应环境和/或需求变化的有效性和效率的程度。该特性包括模块化(软件由离散组件组成的程度)、可重用性(开发人员可以在多个系统中或构建其他资产时使用资产的程度)、可分析性(开发人员多容易收集关于软件的具体指标)、可修改性(开发人员在不引入缺陷或降低现有产品质量的情况下修改软件的程度)和可测试性(开发人员和其他人多容易测试软件)。
Portability 可移植性
Degree to which developers can transfer a system, product, or component from one hardware, software, or other operational or usage environment to another. This characteristic includes the subcharacteristics of adaptability (can developers effectively and efficiently adapt the software for different or evolving hardware, software, or other operational or usage environments), installability (can the software be installed and/or uninstalled in a specified environment), and replaceability (how easily developers can replace the functionality with other software). 开发人员将系统、产品或组件从一种硬件、软件或其他操作或使用环境转移到另一种环境的能力。这一特性包括适应性(开发人员能否有效且高效地将软件适应于不同或不断发展的硬件、软件或其他操作或使用环境)、可安装性(软件能否在指定环境中安装和/或卸载)以及可替换性(开发人员更换功能为其他软件的难易程度)。
The last item in the ISO list addresses the functional aspects of software, which we do not believe belongs in this list: ISO 列表中的最后一项涉及软件的功能方面,我们认为这不应包含在此列表中:
Functional suitability 功能适用性
This characteristic represents the degree to which a product or system provides functions that meet stated and implied needs when used under specified conditions. This characteristic is composed of the following subcharacteristics: 该特性表示产品或系统在特定条件下提供满足明确和隐含需求的功能的程度。该特性由以下子特性组成:
Functional completeness 功能完备性
Degree to which the set of functions covers all the specified tasks and user objectives. 功能集合覆盖所有指定任务和用户目标的程度。
Functional correctness 功能正确性
Degree to which a product or system provides the correct results with the needed degree of precision. 产品或系统提供正确结果的程度以及所需的精确度。
Functional appropriateness 功能适宜性
Degree to which the functions facilitate the accomplishment of specified tasks and objectives. These are not architecture characteristics but rather the motivational requirements to build the software. This illustrates how thinking about the relationship between architecture characteristics and the problem domain has evolved. We cover this evolution in Chapter 7. 功能促进完成特定任务和目标的程度。这些不是架构特征,而是构建软件的动机要求。这说明了关于架构特征与问题领域之间关系的思考是如何发展的。我们在第 7 章中讨论了这一演变。
The Many Ambiguities in Software Architecture 软件架构中的许多模糊性
A consistent frustration amongst architects is the lack of clear definitions of so many critical things, including the activity of software architecture itself! This leads companies to define their own terms for common things, which leads to industry-wide confusion because architects either use opaque terms or, worse yet, use the same terms for wildly different meanings. As much as we’d like, we can’t impose a standard nomenclature on the software development world. However, we do follow and recommend the advice from domain-driven design to establish and use a ubiquitous language amongst fellow employees to help ensure fewer term-based misunderstandings. 建筑师们普遍感到沮丧的是,许多关键事物缺乏明确的定义,包括软件架构本身的活动!这导致公司为常见事物定义自己的术语,从而在行业内造成混乱,因为建筑师要么使用模糊的术语,要么更糟糕的是,使用相同的术语却有截然不同的含义。尽管我们希望如此,但我们无法在软件开发领域强加标准的命名法。然而,我们确实遵循并推荐领域驱动设计的建议,在同事之间建立和使用通用语言,以帮助确保减少基于术语的误解。
Trade-Offs and Least Worst Architecture 权衡与最差架构
Applications can only support a few of the architecture characteristics we’ve listed for a variety of reasons. First, each of the supported characteristics requires design effort and perhaps structural support. Second, the bigger problem lies with the fact that each architecture characteristic often has an impact on others. For example, if an architect wants to improve security, it will almost certainly negatively impact performance: the application must do more on-the-fly encryption, indirection for secrets hiding, and other activities that potentially degrade performance. 应用程序只能支持我们列出的少数架构特性,原因有很多。首先,每个支持的特性都需要设计工作和可能的结构支持。其次,更大的问题在于每个架构特性往往会对其他特性产生影响。例如,如果架构师想要提高安全性,这几乎肯定会对性能产生负面影响:应用程序必须进行更多的实时加密、秘密隐藏的间接处理以及其他可能降低性能的活动。
A metaphor will help illustrate this interconnectivity. Apparently, pilots often struggle learning to fly helicopters because it requires a control for each hand and each foot, and changing one impacts the others. Thus, flying a helicopter is a balancing exercise, which nicely describes the trade-off process when choosing architecture characteristics. Each architecture characteristic that an architect designs support for potentially complicates the overall design. 一个隐喻将有助于说明这种互联性。显然,飞行员在学习驾驶直升机时常常感到困难,因为这需要每只手和每只脚各有一个控制杆,并且改变一个会影响其他的。因此,驾驶直升机是一种平衡练习,这很好地描述了在选择架构特性时的权衡过程。架构师设计的每个架构特性都可能使整体设计变得更加复杂。
Thus, architects rarely encounter the situation where they are able to design a system and maximize every single architecture characteristic. More often, the decisions come down to trade-offs between several competing concerns. 因此,架构师很少遇到能够设计一个系统并最大化每一个架构特性的情况。更常见的是,决策往往涉及多个相互竞争的关注点之间的权衡。
Never shoot for the best architecture, but rather the least worst architecture. 永远不要追求最佳架构,而是追求最不糟糕的架构。
Too many architecture characteristics leads to generic solutions that are trying to solve every business problem, and those architectures rarely work because the design becomes unwieldy. 过多的架构特性导致通用解决方案试图解决每个业务问题,而这些架构很少有效,因为设计变得笨重。
This suggests that architects should strive to design architecture to be as iterative as possible. If you can make changes to the architecture more easily, you can stress less 这表明架构师应该努力设计尽可能迭代的架构。如果你能更容易地对架构进行更改,你就可以减少压力。
about discovering the exact correct thing in the first attempt. One of the most important lessons of Agile software development is the value of iteration; this holds true at all levels of software development, including architecture. 关于在第一次尝试中发现确切正确的事物。敏捷软件开发最重要的教训之一是迭代的价值;这一点在软件开发的所有层面上都适用,包括架构。
Identifying Architectural Characteristics 识别架构特征
Identifying the driving architectural characteristics is one of the first steps in creating an architecture or determining the validity of an existing architecture. Identifying the correct architectural characteristics ("-ilities") for a given problem or application requires an architect to not only understand the domain problem, but also collaborate with the problem domain stakeholders to determine what is truly important from a domain perspective. 识别驱动的架构特征是创建架构或确定现有架构有效性的第一步之一。为特定问题或应用识别正确的架构特征(“-ilities”)需要架构师不仅理解领域问题,还要与问题领域的利益相关者合作,以确定从领域角度来看真正重要的内容。
An architect uncovers architecture characteristics in at least three ways by extracting from domain concerns, requirements, and implicit domain knowledge. We previously discussed implicit characteristics and we cover the other two here. 架构师通过从领域关注、需求和隐含领域知识中提取,至少以三种方式揭示架构特征。我们之前讨论了隐含特征,这里将讨论另外两种。
Extracting Architecture Characteristics from Domain Concerns 从领域关注中提取架构特征
An architect must be able to translate domain concerns to identify the right architectural characteristics. For example, is scalability the most important concern, or is it fault tolerance, security, or performance? Perhaps the system requires all four characteristics combined. Understanding the key domain goals and domain situation allows an architect to translate those domain concerns to “-ilities,” which then forms the basis for correct and justifiable architecture decisions. 架构师必须能够将领域关注点转化为识别正确的架构特征。例如,扩展性是最重要的关注点,还是容错性、安全性或性能?也许系统需要将这四个特征结合在一起。理解关键的领域目标和领域情况使架构师能够将这些领域关注点转化为“-ilities”,这将形成正确和合理的架构决策的基础。
One tip when collaborating with domain stakeholders to define the driving architecture characteristics is to work hard to keep the final list as short as possible. A common anti-pattern in architecture entails trying to design a generic architecture, one that supports all the architecture characteristics. Each architecture characteristic the architecture supports complicates the overall system design; supporting too many architecture characteristics leads to greater and greater complexity before the architect and developers have even started addressing the problem domain, the original 与领域利益相关者合作定义驱动架构特征时的一个建议是,努力将最终列表保持尽可能简短。架构中的一个常见反模式是试图设计一个通用架构,即支持所有架构特征的架构。每个架构特征的支持都会使整体系统设计变得复杂;支持过多的架构特征会导致更大的复杂性,甚至在架构师和开发人员开始解决问题领域之前。
motivation for writing the software. Don’t obsess over the number of charateristics, but rather the motivation to keep design simple. 编写软件的动机。不要过于关注特征的数量,而是关注保持设计简单的动机。
Case Study: The Vasa 案例研究:瓦萨号
The original story of over-specifying architecture characteristics and ultimately killing a project must be the Vasa. It was a Swedish warship built between 1626 and 1628 by a king who wanted the most magnificent ship ever created. Up until that time, ships were either troop transports or gunships-the Vasa would be both! Most ships had one deck-the Vasa had two! All the cannons were twice the size of those on similar ships. Despite some trepidation by the expert ship builders (who ultimately couldn’t say no to King Adolphus), the shipbuilders finished the construction. In celebration, the ship sailed out into the harbor and shot a cannon salute off one side. Unfortunately, because the ship was top-heavy, it capsized and sank to the bottom of the bay in Sweden. In the early 20th century, salvagers rescued the ship, which now resides in a museum in Stockholm. 过度规定架构特征并最终导致项目失败的原始故事必须是瓦萨号。它是一艘瑞典战舰,建造于 1626 年至 1628 年之间,由一位希望建造出最宏伟船只的国王下令建造。在那之前,船只要么是运输部队的,要么是炮舰——瓦萨号将兼具这两种功能!大多数船只只有一层甲板——瓦萨号有两层甲板!所有的炮都比类似船只上的炮大两倍。尽管专家船舶制造商有些担忧(最终无法拒绝阿道夫国王的要求),但造船工人还是完成了建造。为了庆祝,这艘船驶出港口,并在一侧鸣放了炮声致敬。不幸的是,由于船体重心过高,它翻覆并沉入瑞典的海湾底部。在 20 世纪初,打捞者救起了这艘船,现在它安置在斯德哥尔摩的一座博物馆中。
Many architects and domain stakeholders want to prioritize the final list of architecture characteristics that the application or system must support. While this is certainly desirable, in most cases it is a fool’s errand and will not only waste time, but also produce a lot of unnecessary frustration and disagreement with the key stakeholders. Rarely will all stakeholders agree on the priority of each and every characteristic. A better approach is to have the domain stakeholders select the top three most important characteristics from the final list (in any order). Not only is this much easier to gain consensus on, but it also fosters discussions about what is most important and helps the architect analyze trade-offs when making vital architecture decisions. 许多架构师和领域利益相关者希望优先考虑应用程序或系统必须支持的架构特征的最终列表。虽然这当然是可取的,但在大多数情况下,这是一项愚蠢的任务,不仅会浪费时间,还会给关键利益相关者带来许多不必要的挫折和分歧。所有利益相关者很少会就每个特征的优先级达成一致。更好的方法是让领域利益相关者从最终列表中选择三个最重要的特征(顺序不限)。这不仅更容易达成共识,还促进了关于什么是最重要的讨论,并帮助架构师在做出重要架构决策时分析权衡。
Most architecture characteristics come from listening to key domain stakeholders and collaborating with them to determine what is important from a domain perspective. While this may seem like a straightforward activity, the problem is that architects and domain stakeholders speak different languages. Architects talk about scalability, interoperability, fault tolerance, learnability, and availability. Domain stakeholders talk about mergers and acquisitions, user satisfaction, time to market, and competitive advantage. What happens is a “lost in translation” problem where the architect and domain stakeholder don’t understand each other. Architects have no idea how to create an architecture to support user satisfaction, and domain stakeholders don’t understand why there is so much focus and talk about availability, interoperability, learnability, and fault tolerance in the application. Fortunately, there is usually a translation from domain concerns to architecture characteristics. Table 5-1 shows some of the more common domain concerns and the corresponding “-ilities” that support them. 大多数架构特性来自于倾听关键领域利益相关者的意见,并与他们合作确定从领域角度来看什么是重要的。虽然这看起来是一个简单的活动,但问题在于架构师和领域利益相关者说着不同的语言。架构师谈论可扩展性、互操作性、容错性、可学习性和可用性。领域利益相关者则谈论并购、用户满意度、上市时间和竞争优势。结果就是出现了“翻译失误”的问题,架构师和领域利益相关者彼此不理解。架构师不知道如何创建一个支持用户满意度的架构,而领域利益相关者则不理解为什么在应用程序中如此关注和讨论可用性、互操作性、可学习性和容错性。幸运的是,通常可以将领域关注点转化为架构特性。表 5-1 显示了一些更常见的领域关注点及其对应的“-ility”特性。
Table 5-1. Translation of domain concerns to architecture characteristics 表 5-1. 领域关注点与架构特征的对应关系
Domain concern Architecture characteristics
Mergers and acquisitions Interoperability, scalability, adaptability, extensibility
Time to market Agility, testability, deployability
User satisfaction Performance, availability, fault tolerance, testability, deployability, agility, security
Competitive advantage Agility, testability, deployability, scalability, availability, fault tolerance
Time and budget Simplicity, feasibility| Domain concern | Architecture characteristics |
| :--- | :--- |
| Mergers and acquisitions | Interoperability, scalability, adaptability, extensibility |
| Time to market | Agility, testability, deployability |
| User satisfaction | Performance, availability, fault tolerance, testability, deployability, agility, security |
| Competitive advantage | Agility, testability, deployability, scalability, availability, fault tolerance |
| Time and budget | Simplicity, feasibility |
One important thing to note is that agility does not equal time to market. Rather, it is agility + testability + deployability. This is a trap many architects fall into when translating domain concerns. Focusing on only one of the ingredients is like forgetting to put the flour in the cake batter. For example, a domain stakeholder might say something like “Due to regulatory requirements, it is absolutely imperative that we complete end-of-day fund pricing on time.” An ineffective architect might just focus on performance because that seems to be the primary focus of that domain concern. However, that architect will fail for many reasons. First, it doesn’t matter how fast the system is if it isn’t available when needed. Second, as the domain grows and more funds are created, the system must be able to also scale to finish end-of-day processing in time. Third, the system must not only be available, but must also be reliable so that it doesn’t crash as end-of-day fund prices are being calculated. Forth, what happens if the end-of-day fund pricing is about 85%85 \% complete and the system crashes? It must be able to recover and restart where the pricing left off. Finally, the system may be fast, but are the fund prices being calculated correctly? So, in addition to performance, the architect must also equally place a focus on availability, scalability, reliability, recoverability, and auditability. 一个重要的事情是,敏捷并不等于上市时间。相反,它是敏捷 + 可测试性 + 可部署性。这是许多架构师在翻译领域关注时陷入的陷阱。只关注其中一个成分就像忘记在蛋糕面糊中放面粉。例如,一个领域利益相关者可能会说:“由于监管要求,我们必须按时完成日终基金定价,这是绝对必要的。”一个无效的架构师可能只关注性能,因为这似乎是该领域关注的主要焦点。然而,这位架构师会因多种原因而失败。首先,如果系统在需要时不可用,那么系统的速度快也没有意义。其次,随着领域的增长和更多基金的创建,系统必须能够扩展以按时完成日终处理。第三,系统不仅必须可用,还必须可靠,以便在计算日终基金价格时不会崩溃。第四,如果日终基金定价快要完成时系统崩溃,会发生什么?它必须能够恢复并从定价中断的地方重新启动。 最后,系统可能很快,但基金价格是否计算正确?因此,除了性能之外,架构师还必须同样关注可用性、可扩展性、可靠性、可恢复性和可审计性。
Extracting Architecture Characteristics from Requirements 从需求中提取架构特征
Some architecture characteristics come from explicit statements in requirements documents. For example, explicit expected numbers of users and scale commonly appear in domain or domain concerns. Others come from inherent domain knowledge by architects, one of the many reasons that domain knowledge is always beneficial for architects. For example, suppose an architect designs an application that handles class registration for university students. To make the math easy, assume that the school has 1,000 students and 10 hours for registration. Should an architect design a system assuming consistent scale, making the implicit assumption that the students during the registration process will distribute themselves evenly over time? Or, based on knowledge of university students habits and proclivities, should the architect design a system that can handle all 1,000 students attempting to register in the last 10 minutes? Anyone who understands how much students stereotypically 一些架构特征来自于需求文档中的明确陈述。例如,预期的用户数量和规模通常出现在领域或领域关注中。其他特征则来自于架构师的内在领域知识,这也是领域知识对架构师始终有益的众多原因之一。例如,假设一位架构师设计一个处理大学生课程注册的应用程序。为了简化计算,假设学校有 1000 名学生,注册时间为 10 小时。架构师应该设计一个假设一致规模的系统,隐含假设在注册过程中学生会均匀分布在时间上吗?还是基于对大学生习惯和倾向的了解,架构师应该设计一个能够处理所有 1000 名学生在最后 10 分钟内尝试注册的系统?任何了解学生典型行为的人都知道
procrastinate knows the answer to this question! Rarely will details like this appear in requirements documents, yet they do inform the design decisions. 拖延症知道这个问题的答案!这样的细节很少出现在需求文档中,但它们确实会影响设计决策。
The Origin of Architecture Katas 建筑卡塔的起源
A few years ago, Ted Neward, a well-known architect, devised architecture katas, a clever method to allow nascent architects a way to practice deriving architecture characteristics from domain-targeted descriptions. From Japan and martial arts, a kata is an individual training exercise, where the emphasis lies on proper form and technique. 几年前,著名架构师 Ted Neward 设计了架构 kata,这是一种巧妙的方法,让新兴架构师能够从针对特定领域的描述中练习推导架构特征。来自日本和武术,kata 是一种个人训练练习,重点在于正确的形式和技巧。
How do we get great designers? Great designers design, of course. 我们如何培养优秀的设计师?优秀的设计师当然是设计的。
-Fred Brooks -弗雷德·布鲁克斯
So how are we supposed to get great architects if they only get the chance to architect fewer than a half dozen times in their career? 那么,如果他们在职业生涯中只有不到六次的机会进行架构设计,我们应该如何培养出优秀的架构师呢?
To provide a curriculum for aspiring architects, Ted created the first architecture katas site, which your authors Neal and Mark adapted and updated. The basic premise of the kata exercise provides architects with a problem stated in domain terms and additional context (things that might not appear in requirements yet impact design). Small teams work for 45 minutes on a design, then show results to the other groups, who vote on who came up with the best architecture. True to its original purpose, architecture katas provide a useful laboratory for aspiring architects. 为了为有志于成为架构师的人提供课程,Ted 创建了第一个架构 katas 网站,您的作者 Neal 和 Mark 对其进行了改编和更新。kata 练习的基本前提是为架构师提供一个用领域术语表述的问题和额外的背景(可能在需求中未出现但影响设计的事物)。小团队在设计上工作 45 分钟,然后向其他小组展示结果,其他小组投票选出谁提出了最佳架构。忠于其最初的目的,架构 katas 为有志于成为架构师的人提供了一个有用的实验室。
Each kata has predefined sections: 每个 kata 都有预定义的部分:
Description 描述
The overall domain problem the system is trying to solve 系统试图解决的整体领域问题
Users 用户
The expected number and/or types of users of the system 系统的预期用户数量和/或类型
Requirements 需求
Domain/domain-level requirements, as an architect might expect from domain users/domain experts 领域/领域级需求,作为架构师可能从领域用户/领域专家那里期望的
Neal updated the format a few years later on his blog to add the additional context section to each kata with important additional considerations, making the exercises more realistic. Neal 在几年前在他的博客上更新了格式,为每个 kata 添加了额外的上下文部分,包含重要的附加考虑,使练习更加真实。
Additional context 额外的上下文
Many of the considerations an architect must make aren’t explicitly expressed in requirements but rather by implicit knowledge of the problem domain 许多建筑师必须考虑的因素并不是在需求中明确表达的,而是通过对问题领域的隐性知识来体现的
We encourage burgeoning architects to use the site to do their own kata exercise. Anyone can host a brown-bag lunch where a team of aspiring architects can solve a problem and get an experienced architect to evaluate the design and trade-off 我们鼓励新兴的架构师使用该网站进行自己的 kata 练习。任何人都可以举办一个午餐会,邀请一组有抱负的架构师解决一个问题,并请一位经验丰富的架构师评估设计和权衡。
analysis, either on the spot or from a short analysis after the fact. The design won’t be elaborate because the exercise is timeboxed. Team members ideally get feedback from the experienced architecture about missed trade-offs and alternative designs. 分析,可以是现场的或事后短期分析。设计不会很复杂,因为这个练习是有时间限制的。团队成员理想情况下会从经验丰富的架构师那里获得关于错过的权衡和替代设计的反馈。
Case Study: Silicon Sandwiches 案例研究:硅三明治
To illustrate several concepts, we use an architecture kata (see “The Origin of Architecture Katas” on page 68 for the origin of the concept). To show how architects derive architecture characteristics from requirements, we introduce the Silicon Sandwiches kata. 为了说明几个概念,我们使用一个架构练习(有关该概念的起源,请参见第 68 页的“架构练习的起源”)。为了展示架构师如何从需求中推导出架构特征,我们引入了硅三明治练习。
Description 描述
A national sandwich shop wants to enable online ordering (in addition to its current call-in service). 一家全国连锁三明治店希望启用在线订购(除了目前的电话服务)。
Users 用户
Thousands, perhaps one day millions 成千上万,也许有一天会达到百万
Requirements 需求
Users will place their order, then be given a time to pick up their sandwich and directions to the shop (which must integrate with several external mapping services that include traffic information) 用户将下订单,然后会被告知取餐时间和前往商店的路线(这必须与包括交通信息在内的多个外部地图服务集成)
If the shop offers a delivery service, dispatch the driver with the sandwich to the user 如果商店提供送货服务,请将三明治送到用户那里
Mobile-device accessibility 移动设备可访问性
Offer national daily promotions/specials 提供全国每日促销/特价
Offer local daily promotions/specials 提供本地每日促销/特价
Accept payment online, in person, or upon delivery 在线、亲自或在交付时接受付款
Additional context 额外的上下文
Sandwich shops are franchised, each with a different owner 三明治店是特许经营的,每家店都有不同的老板
Parent company has near-future plans to expand overseas 母公司计划在不久的将来扩展到海外
Corporate goal is to hire inexpensive labor to maximize profit 公司的目标是雇佣廉价劳动力以最大化利润
Given this scenario, how would an architect derive architecture characteristics? Each part of the requirement might contribute to one or more aspects of architecture (and many will not). The architect doesn’t design the entire system here-considerable effort must still go into crafting code to solve the domain statement. Instead, the architect looks for things that influence or impact the design, particularly structural. 在这种情况下,架构师如何推导架构特征?需求的每个部分可能会对架构的一个或多个方面产生影响(而许多则不会)。架构师并不是在这里设计整个系统——仍然需要大量的精力投入到编写代码以解决领域声明中。相反,架构师关注那些影响或影响设计的因素,特别是结构方面。
First, separate the candidate architecture characteristics into explicit and implicit characteristics. 首先,将候选架构特性分为显性特性和隐性特性。
Explicit Characteristics 显式特征
Explicit architecture characteristics appear in a requirements specification as part of the necessary design. For example, a shopping website may aspire to support a particular number of concurrent users, which domain analysts specify in the requirements. An architect should consider each part of the requirements to see if it contributes to an architecture characteristic. But first, an architect should consider domain-level predictions about expected metrics, as represented in the Users section of the kata. 显式架构特征在需求规范中作为必要设计的一部分出现。例如,一个购物网站可能希望支持特定数量的并发用户,这一点由领域分析师在需求中指定。架构师应考虑需求的每个部分,以查看它是否有助于架构特征。但首先,架构师应考虑关于预期指标的领域级预测,如在 kata 的用户部分所示。
One of the first details that should catch an architect’s eye is the number of users: currently thousands, perhaps one day millions (this is a very ambitious sandwich shop!). Thus, scalability-the ability to handle a large number of concurrent users without serious performance degradation-is one of the top architecture characteristics. Notice that the problem statement didn’t explicitly ask for scalability, but rather expressed that requirement as an expected number of users. Architects must often decode domain language into engineering equivalents. 建筑师首先应该注意的一个细节是用户数量:目前是成千上万,也许有一天会达到百万(这是一家非常有雄心的三明治店!)。因此,扩展性——在不严重降低性能的情况下处理大量并发用户的能力——是顶级架构特性之一。请注意,问题陈述并没有明确要求扩展性,而是将该要求表达为预期的用户数量。建筑师通常必须将领域语言解码为工程等价物。
However, we also probably need elasticity-the ability to handle bursts of requests. These two characteristics often appear lumped together, but they have different constraints. Scalability looks like the graph shown in Figure 5-1. 然而,我们可能还需要弹性——处理请求高峰的能力。这两个特性通常被归为一类,但它们有不同的约束。可扩展性看起来像图 5-1 所示的图表。
Figure 5-1. Scalability measures the performance of concurrent users 图 5-1. 可扩展性衡量并发用户的性能
Elasticity, on the other hand, measures bursts of traffic, as shown in Figure 5-2. 另一方面,弹性衡量的是流量的突发,如图 5-2 所示。
Figure 5-2. Elastic systems must withstand bursts of users 图 5-2. 弹性系统必须能够承受用户的突发访问
Some systems are scalable but not elastic. For example, consider a hotel reservation system. Absent special sales or events, the number of users is probably consistent. In contrast, consider a concert ticket booking system. As new tickets go on sale, fervent fans will flood the site, requiring high degrees of elasticity. Often, elastic systems also need scalability: the ability to handle bursts and high numbers of concurrent users. 一些系统是可扩展的,但不是弹性的。例如,考虑一个酒店预订系统。在没有特殊促销或活动的情况下,用户数量可能是稳定的。相比之下,考虑一个音乐会票务预订系统。当新票开始销售时,热情的粉丝会涌入网站,这就需要高度的弹性。通常,弹性系统也需要可扩展性:处理突发流量和大量并发用户的能力。
The requirement for elasticity did not appear in the Silicon Sandwiches requirements, yet the architect should identify this as an important consideration. Requirements sometimes state architecture characteristics outright, but some lurk inside the problem domain. Consider a sandwich shop. Is its traffic consistent throughout the day? Or does it endure bursts of traffic around mealtimes? Almost certainly the latter. Thus, a good architect should identify this potential architecture characteristic. 弹性需求并未出现在硅三明治的需求中,但架构师应该将其视为一个重要的考虑因素。需求有时会明确说明架构特性,但有些则潜藏在问题领域中。考虑一家三明治店。它的客流量在一天中是否一致?还是在用餐时间会经历流量高峰?几乎可以肯定是后者。因此,一个好的架构师应该识别出这个潜在的架构特性。
An architect should consider each of these business requirements in turn to see if architecture characteristics exist: 架构师应逐一考虑这些业务需求,以查看是否存在架构特征:
Users will place their order, then be given a time to pick up their sandwich and directions to the shop (which must provide the option to integrate with external mapping services that include traffic information). 用户将下订单,然后会被告知取餐时间和前往商店的路线(商店必须提供与包括交通信息的外部地图服务集成的选项)。
External mapping services imply integration points, which may impact aspects such as reliability. For example, if a developer builds a system that relies on a third-party system, yet calling it fails, it impacts the reliability of the calling system. However, architects must also be wary of over-specifying architecture characteristics. What if the external traffic service is down? Should the Silicon 外部映射服务意味着集成点,这可能会影响可靠性等方面。例如,如果开发人员构建一个依赖于第三方系统的系统,但调用失败,这会影响调用系统的可靠性。然而,架构师也必须小心过度指定架构特性。如果外部流量服务宕机怎么办?硅谷应该如何应对?
Sandwiches site fail, or should it just offer slightly less efficiency without traffic information? Architects should always guard against building unnecessary brittleness or fragility into designs. 三明治网站失败,还是应该在没有交通信息的情况下提供稍微低一些的效率?架构师应该始终防止在设计中构建不必要的脆弱性或脆弱性。
2. If the shop offers a delivery service, dispatch the driver with the sandwich to the user. 如果商店提供送货服务,请将三明治送到用户那里。
No special architecture characteristics seem necessary to support this requirement. 似乎没有特殊的架构特性来支持这个要求。
3. Mobile-device accessibility. 3. 移动设备可访问性。
This requirement will primarily affect the design of the application, pointing toward building either a portable web application or several native web applications. Given the budget constraints and simplicity of the application, an architect would likely deem it overkill to build multiple applications, so the design points toward a mobile-optimized web application. Thus, the architect may want to define some specific performance architecture characteristics for page load time and other mobile-sensitive characteristics. Notice that the architect shouldn’t act alone in situations like this, but should instead collaborate with user experience designers, domain stakeholders, and other interested parties to vet decisions like this. 该需求将主要影响应用程序的设计,指向构建一个可移植的 Web 应用程序或多个本地 Web 应用程序。考虑到预算限制和应用程序的简单性,架构师可能会认为构建多个应用程序是多余的,因此设计指向一个移动优化的 Web 应用程序。因此,架构师可能希望为页面加载时间和其他对移动敏感的特性定义一些具体的性能架构特性。请注意,架构师在这种情况下不应单独行动,而应与用户体验设计师、领域利益相关者和其他相关方合作,以审查此类决策。
4. Offer national daily promotions/specials. 4. 提供全国性的每日促销/特价。
5. Offer local daily promotions/specials. 5. 提供当地每日促销/特价。
Both of these requirements specify customizability across both promotions and specials. Notice that requirement 1 also implies customized traffic information based on address. Based on all three of these requirements, the architect may consider customizability as an architecture characteristic. For example, an architecture style such as microkernel architecture supports customized behavior extremely well by defining a plug-in architecture. In this case, the default behavior appears in the core, and developers write the optional customized parts, based on location, via plug-ins. However, a traditional design can also accommodate this requirement via design patterns (such as Template Method). This conundrum is common in architecture and requires architects to constantly weight trade-offs between competing options. We discuss particular trade-off in more detail in “Design Versus Architecture and Trade-Offs” on page 74. 这两个要求都指定了促销和特价的可定制性。注意,要求 1 还暗示了基于地址的定制流量信息。基于这三个要求,架构师可能会将可定制性视为一种架构特性。例如,微内核架构等架构风格通过定义插件架构非常好地支持定制行为。在这种情况下,默认行为出现在核心中,开发人员根据位置通过插件编写可选的定制部分。然而,传统设计也可以通过设计模式(如模板方法)来满足这一要求。这种困境在架构中很常见,需要架构师不断权衡竞争选项之间的权衡。我们在第 74 页的“设计与架构及权衡”中更详细地讨论了特定的权衡。
6. Accept payment online, in person, or upon delivery. 6. 在线、亲自或在交付时接受付款。
Online payments imply security, but nothing in this requirement suggests a particularly heightened level of security beyond what’s implicit. 在线支付意味着安全,但这个要求中没有任何内容表明需要比隐含的更高水平的安全性。
7. Sandwich shops are franchised, each with a different owner. 7. 三明治店是特许经营的,每家店都有不同的老板。
This requirement may impose cost restrictions on the architecture-the architect should check the feasibility (applying constraints like cost, time, and staff skill set) to see if a simple or sacrificial architecture is warranted. 此要求可能对架构施加成本限制——架构师应检查可行性(应用成本、时间和员工技能等约束)以确定是否需要简单或牺牲性的架构。
8. Parent company has near-future plans to expand overseas. 母公司在不久的将来有计划扩展海外。
This requirement implies internationalization, or i 18 ni 18 n. Many design techniques exist to handle this requirement, which shouldn’t require special structure to accommodate. This will, however, certainly drive design decisions. 这个要求意味着国际化,或 i 18 ni 18 n 。存在许多设计技术来处理这个要求,这不应该需要特殊的结构来适应。然而,这肯定会影响设计决策。
9. Corporate goal is to hire inexpensive labor to maximize profit. 公司的目标是雇佣廉价劳动力以最大化利润。
This requirement suggests that usability will be important, but again is more concerned with design than architecture characteristics. 这个要求表明可用性将是重要的,但再次强调的是更关注设计而不是架构特性。
The third architecture characteristic we derive from the preceding requirements is performance: no one wants to buy from a sandwich shop that has poor performance, especially at peak times. However, performance is a nuanced concept-what kind of performance should the architect design for? We cover the various nuances of performance in Chapter 6. 我们从前面的需求中得出的第三个架构特性是性能:没有人想在高峰时段光顾一家表现不佳的三明治店。然而,性能是一个微妙的概念——架构师应该设计什么样的性能?我们在第六章中讨论了性能的各种细微差别。
We also want to define performance numbers in conjunction with scalability numbers. In other words, we must establish a baseline of performance without particular scale, as well as determine what an acceptable level of performance is given a certain number of users. Quite often, architecture characteristics interact with one another, forcing architects to define them in relation to one another. 我们还希望在可扩展性指标的基础上定义性能指标。换句话说,我们必须在没有特定规模的情况下建立性能基线,并确定在一定数量的用户下可接受的性能水平。架构特性往往相互影响,迫使架构师将它们相互关联地定义。
Implicit Characteristics 隐式特征
Many architecture characteristics aren’t specified in requirements documents, yet they make up an important aspect of the design. One implicit architecture characteristic the system might want to support is availability: making sure users can access the sandwich site. Closely related to availability is reliability: making sure the site stays up during interactions-no one wants to purchase from a site that continues dropping connections, forcing them to log in again. 许多架构特性在需求文档中并未明确规定,但它们构成了设计的重要方面。系统可能希望支持的一个隐含架构特性是可用性:确保用户可以访问三明治网站。与可用性密切相关的是可靠性:确保网站在交互过程中保持在线——没有人想从一个不断掉线的网站购买,这会迫使他们重新登录。
Security appears as an implicit characteristic in every system: no one wants to create insecure software. However, it may be prioritized depending on criticality, which illustrates the interlocking nature of our definition. An architect considers security an architecture characteristic if it influences some structural aspect of the design and is critical or important to the application. 安全在每个系统中都作为一种隐含特性出现:没有人想要创建不安全的软件。然而,根据关键性,它可能会被优先考虑,这说明了我们定义的相互关联性。如果安全性影响设计的某些结构方面,并且对应用程序至关重要或重要,架构师会将其视为架构特性。
For Silicon Sandwiches, an architect might assume that payments should be handled by a third party. Thus, as long as developers follow general security hygiene (not passing credit card numbers as plain text, not storing too much information, and so on), the architect shouldn’t need any special structural design to accommodate security; good design in the application will suffice. Each architecture characteristic interacts with the others, leading to the common pitfall of architects of over-specifying architecture characteristics, which is just as damaging as under-specifying them because it overcomplicates the system design. 对于硅沙拉,架构师可能会假设支付应该由第三方处理。因此,只要开发人员遵循一般的安全卫生(不以明文传递信用卡号码,不存储过多信息,等等),架构师就不需要任何特殊的结构设计来适应安全;应用程序中的良好设计就足够了。每个架构特性相互作用,导致架构师过度指定架构特性的常见陷阱,这与不足指定它们一样有害,因为这会使系统设计过于复杂。
The last major architecture characteristic that Silicon Sandwiches needs to support encompasses several details from the requirements: customizability. Notice that several parts of the problem domain offer custom behavior: recipes, local sales, and directions that may be locally overridden. Thus, the architecture should support the ability to facilitate custom behavior. Normally, this would fall into the design of the application. However, as our definition specifies, a part of the problem domain that relies on custom structure to support it moves into the realm of an architecture characteristic. This design element isn’t critical to the success of the application though. It is important to note that there are no correct answers in choosing architecture characteristics, only incorrect ones (or, as Mark notes in one of his well-known quotes): 硅三明治需要支持的最后一个主要架构特性涵盖了来自需求的几个细节:可定制性。请注意,问题域的几个部分提供了自定义行为:食谱、本地销售和可能被本地覆盖的方向。因此,架构应该支持促进自定义行为的能力。通常,这将属于应用程序的设计。然而,正如我们的定义所指定的,依赖于自定义结构来支持的一个问题域部分进入了架构特性的领域。这个设计元素对应用程序的成功并不是至关重要的。重要的是要注意,在选择架构特性时没有正确的答案,只有错误的答案(或者,正如马克在他的一句著名引用中所提到的):
There are no wrong answers in architecture, only expensive ones. 在架构中没有错误的答案,只有昂贵的答案。
Design Versus Architecture and Trade-Offs 设计与架构及权衡
In the Silicon Sandwiches kata, an architect would likely identify customizability as a part of the system, but the question then becomes: architecture or design? The architecture implies some structural component, whereas design resides within the architecture. In the customizability case of Silicon Sandwiches, the architect could choose an architecture style like microkernel and build structural support for customization. However, if the architect chose another style because of competing concerns, developers could implement the customization using the Template Method design pattern, which allows parent classes to define workflow that can be overridden in child classes. Which design is better? 在硅三明治的练习中,架构师可能会将可定制性视为系统的一部分,但问题随之而来:架构还是设计?架构意味着某种结构组件,而设计则存在于架构之内。在硅三明治的可定制性案例中,架构师可以选择像微内核这样的架构风格,并为定制构建结构支持。然而,如果架构师因为竞争性考虑选择了另一种风格,开发人员可以使用模板方法设计模式来实现定制,该模式允许父类定义可以在子类中重写的工作流程。哪种设计更好?
Like in all architecture, it depends on a number of factors. First, are there good reasons, such as performance and coupling, not to implement a microkernel architecture? Second, are other desirable architecture characteristics more difficult in one design versus the other? Third, how much would it cost to support all the architecture characteristics in each design versus pattern? This type of architectural trade-off analysis makes up an important part of an architect’s role. 与所有架构一样,这取决于多个因素。首先,是否有充分的理由,例如性能和耦合,来不实施微内核架构?其次,其他期望的架构特性在一种设计中是否比另一种设计更难实现?第三,支持每种设计与模式中的所有架构特性需要多少成本?这种类型的架构权衡分析是架构师角色的重要组成部分。
Above all, it is critical for the architect to collaborate with the developers, project manager, operations team, and other co-constructors of the software system. No architecture decision should be made isolated from the implementation team (which leads to the dreaded Ivory Tower Architect anti-pattern). In the case of Silicon Sandwiches, the architect, tech lead, developers, and domain analysts should collaborate to decide how best to implement customizability. 最重要的是,架构师必须与开发人员、项目经理、运营团队以及软件系统的其他共同构建者进行合作。任何架构决策都不应与实施团队孤立作出(这会导致令人畏惧的象牙塔架构师反模式)。在硅制三明治的情况下,架构师、技术负责人、开发人员和领域分析师应合作决定如何最好地实现可定制性。
An architect could design an architecture that doesn’t accommodate customizability structurally, requiring the design of the application itself to support that behavior (see “Design Versus Architecture and Trade-Offs” on page 74). Architects shouldn’t stress too much about discovering the exactly correct set of architecture characteristicsdevelopers can implement functionality in a variety of ways. However, correctly iden- 架构师可以设计一个在结构上不支持可定制性的架构,这要求应用程序本身的设计来支持这种行为(见第 74 页的“设计与架构及权衡”)。架构师不应该过于担心发现完全正确的架构特征集,开发人员可以以多种方式实现功能。然而,正确识别
tifying important structural elements may facilitate a simpler or more elegant design. Architects must remember: there is no best design in architecture, only a least worst collection of trade-offs. 识别重要的结构元素可能有助于实现更简单或更优雅的设计。建筑师必须记住:在建筑中没有最佳设计,只有一组最不糟糕的权衡。
Architects must also prioritize these architecture characteristics toward trying to find the simplest required sets. A useful exercise once the team has made a first pass at identifying the architecture characteristics is to try to determine the least important one-if you must eliminate one, which would it be? Generally, architects are more likely to cull the explicit architecture characteristics, as many of the implicit ones support general success. The way we define what’s critical or important to success assists architects in determining if the application truly requires each architecture characteristic. By attempting to determine the least applicable one, architects can help determine critical necessity. In the case of Silicon Sandwiches, which architecture characteristic that we have identified is least important? Again, no absolute correct answer exists. However, in this case, the solution could lose either customizability or performance. We could eliminate customizability as an architecture characteristic and plan to implement that behavior as part of application design. Of the operational architecture characteristics, performance is likely the least critical for success. Of course, the developers don’t mean to build an application that has terrible performance, but rather one that doesn’t prioritize performance over other characteristics, such as scalability or availability. 架构师还必须优先考虑这些架构特性,以寻找所需的最简单集合。一旦团队对架构特性进行了初步识别,一个有用的练习是尝试确定最不重要的特性——如果必须消除一个,那将是哪个?一般来说,架构师更可能剔除显式的架构特性,因为许多隐式特性支持整体成功。我们定义什么是成功的关键或重要的方式,帮助架构师确定应用程序是否确实需要每个架构特性。通过尝试确定最不适用的特性,架构师可以帮助确定关键的必要性。在硅三明治的案例中,我们识别出的哪个架构特性是最不重要的?同样,没有绝对正确的答案。然而,在这种情况下,解决方案可能会失去可定制性或性能。我们可以将可定制性作为一个架构特性消除,并计划将该行为作为应用程序设计的一部分来实现。在操作架构特性中,性能可能是成功的最不关键因素。 当然,开发人员并不打算构建一个性能糟糕的应用程序,而是一个在可扩展性或可用性等其他特性上不优先考虑性能的应用程序。
Measuring and Governing Architecture Characteristics 测量和管理架构特性
Architects must deal with the extraordinarily wide variety of architecture characteristics across all different aspects of software projects. Operational aspects like performance, elasticity, and scalability comingle with structural concerns such as modularity and deployability. This chapter focuses on concretely defining some of the more common architecture characteristics and building governance mechanisms for them. 架构师必须处理软件项目各个方面中极其广泛的架构特征。性能、弹性和可扩展性等操作方面与模块化和可部署性等结构性问题交织在一起。本章重点明确一些更常见的架构特征,并为它们建立治理机制。
Measuring Architecture Characteristics 测量架构特性
Several common problems exist around the definition of architecture characteristics in organizations: 在组织中,关于架构特征的定义存在几个常见问题:
They aren't physics 它们不是物理学
Many architecture characteristics in common usage have vague meanings. For example, how does an architect design for agility or deployability? The industry has wildly differing perspectives on common terms, sometimes driven by legitimate differing contexts, and sometimes accidental. 许多常用的架构特性具有模糊的含义。例如,架构师如何为敏捷性或可部署性进行设计?行业对常见术语的看法差异很大,有时是由于合法的不同背景,有时则是偶然的。
Wildly varying definitions 极其不同的定义
Even within the same organization, different departments may disagree on the definition of critical features such as performance. Until developers, architecture, and operations can unify on a common definition, a proper conversation is difficult. 即使在同一个组织内,不同部门可能对关键特性如性能的定义存在分歧。在开发人员、架构和运营能够统一一个共同定义之前,进行适当的对话是困难的。
Too composite 过于复杂
Many desirable architecture characteristics comprise many others at a smaller scale. For example, developers can decompose agility into characteristics such as modularity, deployability, and testability. 许多理想的架构特性在更小的规模上包含许多其他特性。例如,开发人员可以将敏捷性分解为模块化、可部署性和可测试性等特性。
Objective definitions for architecture characteristics solve all three problems: by agreeing organization-wide on concrete definitions for architecture characteristics, teams create a ubiquitous language around architecture. Also, by encouraging objective definitions, teams can unpack composite characteristics to uncover measurable features they can objectively define. 架构特征的目标定义解决了所有三个问题:通过在整个组织内就架构特征的具体定义达成一致,团队围绕架构创建了一种普遍语言。此外,通过鼓励客观定义,团队可以拆解复合特征,以发现可以客观定义的可测量特征。
Operational Measures 操作性度量
Many architecture characteristics have obvious direct measurements, such as performance or scalability. However, even these offer many nuanced interpretations, depending on the team’s goals. For example, perhaps a team measures the average response time for certain requests, a good example of an operational architecture characteristics measure. But if teams only measure the average, what happens if some boundary condition causes 1%1 \% of requests to take 10 times longer than others? If the site has enough traffic, the outliers may not even show up. Therefore, a team may also want to measure the maximum response times to catch outliers. 许多架构特性有明显的直接测量,例如性能或可扩展性。然而,即使这些也提供了许多细微的解释,具体取决于团队的目标。例如,某个团队可能会测量某些请求的平均响应时间,这是操作架构特性测量的一个好例子。但如果团队只测量平均值,如果某些边界条件导致 1%1 \% 的请求花费的时间是其他请求的 10 倍,会发生什么?如果网站的流量足够大,异常值可能根本不会出现。因此,团队可能还想测量最大响应时间以捕捉异常值。
The Many Flavors of Performance 性能的多种表现形式
Many of the architecture characteristics we describe have multiple, nuanced definitions. Performance is a great example. Many projects look at general performance: for example, how long request and response cycles take for a web application. However, architects and DevOps engineers have performed a tremendous amount of work on establishing performance budgets: specific budgets for specific parts of the application. For example, many organizations have researched user behavior and determined that the optimum time for first-page render (the first visible sign of progress for a webpage, in a browser or mobile device) is 500 ms -half a second; Most applications fall in the double-digit range for this metric. But, for modern sites that attempt to capture as many users as possible, this is an important metric to track, and the organizations behind them have built extremely nuanced measures. 我们描述的许多架构特性都有多重、细微的定义。性能就是一个很好的例子。许多项目关注一般性能:例如,Web 应用程序的请求和响应周期需要多长时间。然而,架构师和 DevOps 工程师在建立性能预算方面做了大量工作:为应用程序的特定部分设定具体预算。例如,许多组织研究了用户行为,并确定首屏渲染的最佳时间(在浏览器或移动设备上,网页的第一个可见进度标志)是 500 毫秒 - 半秒;大多数应用程序在这个指标上都落在两位数范围内。但是,对于试图捕获尽可能多用户的现代网站来说,这是一个重要的指标,背后的组织建立了极其细致的衡量标准。
Some of these metrics have additional implications for the design of applications. Many forward-thinking organizations place K-weight budgets for page downloads: a maximum number of bytes’ worth of libraries and frameworks allowed on a particular page. Their rationale behind this structure derives from physics constraints: only so many bytes can travel over a network at a time, especially for mobile devices in high-latency areas. 这些指标对应用程序的设计有额外的影响。许多前瞻性的组织为页面下载设定了 K-weight 预算:在特定页面上允许的库和框架的最大字节数。他们制定这一结构的理由源于物理限制:在网络上同时只能传输有限的字节,尤其是在高延迟区域的移动设备上。
High-level teams don’t just establish hard performance numbers; they base their definitions on statistical analysis. For example, say a video streaming service wants to monitor scalability. Rather than set an arbitrary number as the goal, engineers measure the scale over time and build statistical models, then raise alarms if the real-time metrics fall outside the prediction models. A failure can mean two things: the model is incorrect (which teams like to know) or something is amiss (which teams also like to know). 高级团队不仅仅设定严格的性能指标;他们的定义基于统计分析。例如,假设一个视频流媒体服务想要监控可扩展性。工程师们不是设定一个任意的目标,而是随着时间的推移测量规模并建立统计模型,然后如果实时指标超出预测模型,就会发出警报。失败可能意味着两件事:模型不正确(团队希望知道)或出现了问题(团队也希望知道)。
The kinds of characteristics that teams can now measure are evolving rapidly, in conjunction with tools and nuanced understanding. For example, many teams recently focused on performance budgets for metrics such as first contentful paint and first CPU idle, both of which speak volumes about performance issues for users of webpages on mobile devices. As devices, targets, capabilities, and myriad other things change, teams will find new things and ways to measure. 团队现在可以测量的特征种类正在迅速演变,这与工具和细致的理解相辅相成。例如,许多团队最近专注于性能预算,针对诸如首次内容绘制(first contentful paint)和首次 CPU 空闲(first CPU idle)等指标,这些指标对移动设备网页用户的性能问题有着重要的指示意义。随着设备、目标、能力以及其他无数因素的变化,团队将发现新的测量对象和方法。
Structural Measures 结构度量
Some objective measures are not so obvious as performance. What about internal structural characteristics, such as well-defined modularity? Unfortunately, comprehensive metrics for internal code quality don’t yet exist. However, some metrics and common tools do allow architects to address some critical aspects of code structure, albeit along narrow dimensions. 一些客观指标并不像性能那样明显。内部结构特征,比如明确的模块化呢?不幸的是,针对内部代码质量的全面指标尚不存在。然而,一些指标和常用工具确实允许架构师在某些关键方面处理代码结构,尽管这些方面相对狭窄。
An obvious measurable aspect of code is complexity, defined by the cyclomatic complexity metric. 代码的一个明显可测量的方面是复杂性,由圈复杂度指标定义。
Cyclomatic Complexity 圈复杂度
Cyclomatic Complexity (CC) is a code-level metric designed to provide an object measure for the complexity of code, at the function/method, class, or application level, developed by Thomas McCabe, Sr., in 1976. 圈复杂度 (CC) 是一种代码级度量,旨在为代码的复杂性提供一个客观的衡量标准,适用于函数/方法、类或应用程序级别,由 Thomas McCabe, Sr. 于 1976 年开发。
It is computed by applying graph theory to code, specifically decision points, which cause different execution paths. For example, if a function has no decision statements (such as if statements), then CC=1\mathrm{CC}=1. If the function had a single conditional, then CC =2=2 because two possible execution paths exist. 它是通过将图论应用于代码来计算的,特别是决策点,这些决策点会导致不同的执行路径。例如,如果一个函数没有决策语句(例如 if 语句),那么 CC=1\mathrm{CC}=1 。如果该函数有一个条件,则 CC =2=2 ,因为存在两条可能的执行路径。
The formula for calculating the CC for a single function or method is CC=E-N+2C C=E-N+2, where N represents nodes (lines of code), and E represents edges (possible decisions). Consider the C-like code shown in Example 6-1. 计算单个函数或方法的 CC 的公式为 CC=E-N+2C C=E-N+2 ,其中 N 代表节点(代码行),E 代表边(可能的决策)。请考虑示例 6-1 中显示的 C 类代码。
Example 6-1. Sample code for cyclomatic complexity evaluation 示例 6-1. 计算圈复杂度的示例代码
public void decision(int c1, int c2) {
if (c1 < 100)
return 0;
else if (c1 + C2 > 500)
return 1;
else
return -1;
}
The cyclomatic complexity for Example 6-1 is 3(=3-2+2)3(=3-2+2); the graph appears in Figure 6-1. 示例 6-1 的圈复杂度为 3(=3-2+2)3(=3-2+2) ;图形出现在图 6-1 中。
Figure 6-1. Cyclomatic Complexity for the decision function 图 6-1. 决策函数的圈复杂度
The number 2 appearing in the cyclomatic complexity formula represents a simplification for a single function/method. For fan-out calls to other methods (known as connected components in graph theory), the more general formula is CC =E-N+2P=E-N+2 P, where PP represents the number of connected components. 在圈复杂度公式中出现的数字 2 代表单个函数/方法的简化。对于对其他方法的分支调用(在图论中称为连接组件),更一般的公式是 CC =E-N+2P=E-N+2 P ,其中 PP 代表连接组件的数量。
Architects and developers universally agree that overly complex code represents a code smell; it harms virtually every one of the desirable characteristics of code bases: modularity, testability, deployability, and so on. Yet if teams don’t keep an eye on gradually growing complexity, that complexity will dominate the code base. 架构师和开发人员普遍认为,过于复杂的代码代表了一种代码异味;它几乎损害了代码库的每一个理想特性:模块化、可测试性、可部署性等等。然而,如果团队不关注逐渐增长的复杂性,这种复杂性将主导代码库。
What's a Good Value for Cyclomatic Complexity? 圈复杂度的良好值是多少?
A common question the authors receive when talking about this subject is: what’s a good threshold value for CC? Of course, like all answers in software architecture: it depends! It depends on the complexity of the problem domain. For example, if you have an algorithmically complex problem, the solution will yield complex functions. Some of the key aspects of CC for architects to monitor: are functions complex because of the problem domain or because of poor coding? Alternatively, is the code partitioned poorly? In other words, could a large method be broken down into smaller, logical chunks, distributing the work (and complexity) into more wellfactored methods? 作者在谈论这个主题时常常会收到一个问题:CC 的良好阈值是多少?当然,和软件架构中的所有答案一样:这要看情况!这取决于问题领域的复杂性。例如,如果你有一个算法复杂的问题,解决方案将产生复杂的函数。建筑师需要监控 CC 的一些关键方面:函数的复杂性是由于问题领域还是由于糟糕的编码?或者,代码的划分是否不合理?换句话说,一个大型方法是否可以拆分成更小的、逻辑上合理的块,从而将工作(和复杂性)分配到更合理的方法中?
In general, the industry thresholds for CC suggest that a value under 10 is acceptable, barring other considerations such as complex domains. We consider that threshold very high and would prefer code to fall under five, indicating cohesive, well-factored code. A metrics tool in the Java world, Crap4J, attempts to determine how poor (crappy) your code is by evaluating a combination of CC and code coverage; if CC grows to over 50, no amount of code coverage rescues that code from crappiness. The most terrifying professional artifact Neal ever encountered was a single C function that served as the heart of a commercial software package whose CC was over 800 ! It was a single function with over 4,000 lines of code, including the liberal use of GOTO statements (to escape impossibly deeply nested loops). 一般来说,行业对 CC 的阈值建议低于 10 是可以接受的,除非有其他考虑因素,例如复杂的领域。我们认为这个阈值非常高,更希望代码的 CC 低于 5,这表明代码是内聚的、结构良好的。在 Java 领域,有一个度量工具 Crap4J,试图通过评估 CC 和代码覆盖率的组合来确定你的代码有多糟糕;如果 CC 增长到超过 50,任何代码覆盖率都无法拯救这段代码免于糟糕。尼尔遇到过的最可怕的专业文档是一个单一的 C 函数,它是一个商业软件包的核心,其 CC 超过 800!这是一个包含超过 4000 行代码的单一函数,包括大量使用 GOTO 语句(以逃避无法想象的深度嵌套循环)。
Engineering practices like test-driven development have the accidental (but positive) side effect of generating smaller, less complex methods on average for a given problem domain. When practicing TDD, developers try to write a simple test, then write the smallest amount of code to pass the test. This focus on discrete behavior and good test boundaries encourages well-factored, highly cohesive methods that exhibit low CC. 像测试驱动开发这样的工程实践在一定程度上(但积极地)导致了在给定问题领域中平均生成更小、更简单的方法。当实践 TDD 时,开发人员尝试编写一个简单的测试,然后编写最少量的代码以通过测试。这种对离散行为和良好测试边界的关注鼓励了良好分解、高内聚的方法,这些方法表现出低 CC。
Process Measures 过程度量
Some architecture characteristics intersect with software development processes. For example, agility often appears as a desirable feature. However, it is a composite architecture characteristic that architects may decompose into features such as testability, and deployability. 一些架构特性与软件开发过程交叉。例如,敏捷性通常被视为一个理想的特性。然而,它是一个复合架构特性,架构师可以将其分解为可测试性和可部署性等特性。
Testability is measurable through code coverage tools for virtually all platforms that assess the completeness of testing. Like all software checks, it cannot replace thinking and intent. For example, a code base can have 100%100 \% code coverage yet poor assertions that don’t actually provide confidence in code correctness. However, testability is clearly an objectively measurable characteristic. Similarly, teams can measure deployability via a variety of metrics: percentage of successful to failed deployments, how 可测试性可以通过几乎所有平台的代码覆盖工具进行测量,这些工具评估测试的完整性。像所有软件检查一样,它不能替代思考和意图。例如,一个代码库可以有 100%100 \% 的代码覆盖率,但却有糟糕的断言,实际上并不能提供对代码正确性的信心。然而,可测试性显然是一个客观可测量的特征。同样,团队可以通过各种指标来衡量可部署性:成功部署与失败部署的百分比,如何
long deployments take, issues/bugs raised by deployments, and a host of others. Each team bears the responsibility to arrive at a good set of measurements that capture useful data for their organization, both in quality and quantity. Many of these measures come down to team priorities and goals. 长期部署所需的时间、部署中出现的问题/错误,以及其他许多因素。每个团队都有责任制定一套良好的度量标准,以捕捉对其组织有用的数据,包括质量和数量。这些度量标准往往取决于团队的优先事项和目标。
Agility and its related parts clearly relate to the software development process. However, that process may impact the structure of the architecture. For example, if ease of deployment and testability are high priorities, then an architect would place more emphasis on good modularity and isolation at the architecture level, an example of an architecture characteristic driving a structural decision. Virtually anything within the scope of a software project may rise to the level of an architecture characteristic if it manages to meet our three criteria, forcing an architect to make design decisions to account for it. 敏捷性及其相关部分显然与软件开发过程相关。然而,该过程可能会影响架构的结构。例如,如果部署和可测试性是高优先级,那么架构师会在架构层面上更加重视良好的模块化和隔离,这是一个架构特征驱动结构决策的例子。软件项目范围内的几乎任何内容,如果能够满足我们的三个标准,都可能上升到架构特征的层面,迫使架构师做出设计决策以考虑到它。
Governance and Fitness Functions 治理与适应性函数
Once architects have established architecture characteristics and prioritized them, how can they make sure that developers will respect those priorities? Modularity is a great example of an aspect of architecture that is important but not urgent; on many software projects, urgency dominates, yet architects still need a mechanism for governance. 一旦架构师确定了架构特性并对其进行了优先级排序,他们如何确保开发人员会尊重这些优先级?模块化是架构中一个重要但不紧急的方面的一个很好的例子;在许多软件项目中,紧迫性占主导地位,但架构师仍然需要一个治理机制。
Governing Architecture Characteristics 治理架构特征
Governance, derived from the Greek word kubernan (to steer) is an important responsibility of the architect role. As the name implies, the scope of architecture governance covers any aspect of the software development process that architects (including roles like enterprise architects) want to exert an influence upon. For example, ensuring software quality within an organization falls under the heading of architectural governance because it falls within the scope of architecture, and negligence can lead to disastrous quality problems. 治理,源自希腊词汇 kubernan(引导),是架构师角色的重要责任。顾名思义,架构治理的范围涵盖架构师(包括企业架构师等角色)希望施加影响的任何软件开发过程的方面。例如,确保组织内的软件质量属于架构治理的范畴,因为它在架构的范围内,疏忽可能导致灾难性的质量问题。
Fortunately, increasingly sophisticated solutions exist to relieve this problem from architects, a good example of the incremental growth in capabilities within the software development ecosystem. The drive toward automation on software projects spawned by Extreme Programming created continuous integration, which led to further automation into operations, which we now call DevOps, continuing through to architectural governance. The book Building Evolutionary Architectures (O’Reilly) describes a family of techniques, called fitness functions, used to automate many aspects of architecture governance. 幸运的是,越来越复杂的解决方案存在于架构师面前,以缓解这个问题,这是软件开发生态系统中能力逐步增长的一个好例子。极限编程推动的软件项目自动化的趋势催生了持续集成,进而导致了运营的进一步自动化,我们现在称之为 DevOps,持续到架构治理。《Building Evolutionary Architectures》(O’Reilly)一书描述了一系列技术,称为适应度函数,用于自动化架构治理的许多方面。
Fitness Functions 适应度函数
The word “evolutionary” in Building Evolutionary Architectures comes more from evolutionary computing than biology. One of the authors, Dr. Rebecca Parsons, spent some time in the evolutionary computing space, including tools like genetic algorithms. A genetic algorithm executes and produces an answer and then undergoes mutation by well-known techniques defined within the evolutionary computing world. If a developer tries to design a genetic algorithm to produce some beneficial outcome, they often want to guide the algorithm, providing an objective measure indicating the quality of the outcome. That guidance mechanism is called a fitness function: an object function used to assess how close the output comes to achieving the aim. For example, suppose a developer needed to solve the traveling salesperson problem, a famous problem used as a basis for machine learning. Given a salesperson and a list of cities they must visit, with distances between them, what is the optimum route? If a developer designs a genetic algorithm to solve this problem, one fitness function might evaluate the length of the route, as the shortest possible one represents highest success. Another fitness function might be to evaluate the overall cost associated with the route and attempt to keep cost at a minimum. Yet another might be to evaluate the time the traveling salesperson is away and optimize to shorten the total travel time. 在《构建进化架构》中,“进化”一词更多地来自于进化计算而非生物学。作者之一,Rebecca Parsons 博士,在进化计算领域工作过一段时间,包括遗传算法等工具。遗传算法执行并产生一个答案,然后通过进化计算领域内定义的著名技术进行变异。如果开发人员试图设计一个遗传算法以产生某种有益的结果,他们通常希望引导算法,提供一个客观的衡量标准来指示结果的质量。这个引导机制称为适应度函数:用于评估输出与实现目标的接近程度的目标函数。例如,假设开发人员需要解决旅行推销员问题,这是一个作为机器学习基础的著名问题。给定一个推销员和他们必须访问的城市列表,以及城市之间的距离,最佳路线是什么?如果开发人员设计一个遗传算法来解决这个问题,一个适应度函数可能会评估路线的长度,因为最短的路线代表着最高的成功。 另一个适应度函数可能是评估与路线相关的整体成本,并尽量保持成本在最低水平。还有一个可能是评估旅行销售人员的外出时间,并优化以缩短总旅行时间。
Practices in evolutionary architecture borrow this concept to create an architecture fitness function: 进化架构中的实践借用这个概念来创建架构适应度函数:
Architecture fitness function 架构适应性函数
Any mechanism that provides an objective integrity assessment of some architecture characteristic or combination of architecture characteristics 任何提供某种架构特征或架构特征组合的客观完整性评估的机制
Fitness functions are not some new framework for architects to download, but rather a new perspective on many existing tools. Notice in the definition the phrase any mechanism-the verification techniques for architecture characteristics are as varied as the characteristics are. Fitness functions overlap many existing verification mechanisms, depending on the way they are used: as metrics, monitors, unit testing libraries, chaos engineering, and so on, illustrated in Figure 6-2. 适应度函数并不是建筑师需要下载的新框架,而是对许多现有工具的新视角。注意定义中的短语“任何机制”——架构特征的验证技术与特征一样多种多样。适应度函数与许多现有的验证机制重叠,具体取决于它们的使用方式:作为度量、监控、单元测试库、混沌工程等,如图 6-2 所示。
Figure 6-2. The mechanisms of fitness functions 图 6-2. 适应度函数的机制
Many different tools may be used to implement fitness functions, depending on the architecture characteristics. For example, in “Coupling” on page 44 we introduced metrics to allow architects to assess modularity. Here are a couple of examples of fitness functions that test various aspects of modularity. 根据架构特征,可以使用许多不同的工具来实现适应度函数。例如,在第 44 页的“耦合”中,我们介绍了度量标准,以便架构师评估模块化。以下是一些测试模块化各个方面的适应度函数示例。
Cyclic dependencies 循环依赖
Modularity is an implicit architecture characteristic that most architects care about, because poorly maintained modularity harms the structure of a code base; thus, architects should place a high priority on maintaining good modularity. However, forces work against the architect’s good intentions on many platforms. For example, when coding in any popular Java or .NET development environment, as soon as a developer references a class not already imported, the IDE helpfully presents a dialog asking the developers if they would like to auto-import the reference. This occurs so often that most programmers develop the habit of swatting the auto-import dialog away like a reflex action. However, arbitrarily importing classes or components between one another spells disaster for modularity. For example, Figure 6-3 illustrates a particularly damaging anti-pattern that architects aspire to avoid. 模块化是一个隐含的架构特征,大多数架构师都非常关注,因为维护不良的模块化会损害代码库的结构;因此,架构师应该高度重视保持良好的模块化。然而,在许多平台上,有一些力量与架构师的良好意图相对抗。例如,在任何流行的 Java 或.NET 开发环境中,当开发者引用一个尚未导入的类时,IDE 会友好地弹出一个对话框,询问开发者是否希望自动导入该引用。这种情况发生得如此频繁,以至于大多数程序员养成了像反射动作一样迅速关闭自动导入对话框的习惯。然而,任意导入类或组件之间的关系会对模块化造成灾难性影响。例如,图 6-3 展示了一个特别有害的反模式,架构师们希望避免这种情况。
Figure 6-3. Cyclic dependencies between components 图 6-3. 组件之间的循环依赖关系
In Figure 6-3, each component references something in the others. Having a network of components such as this damages modularity because a developer cannot reuse a single component without also bringing the others along. And, of course, if the other components are coupled to other components, the architecture tends more and more toward the Big Ball of Mud anti-pattern. How can architects govern this behavior without constantly looking over the shoulders of trigger-happy developers? Code reviews help but happen too late in the development cycle to be effective. If an architect allows a development team to rampantly import across the code base for a week until the code review, serious damage has already occurred in the code base. 在图 6-3 中,每个组件都引用了其他组件中的某些内容。拥有这样的组件网络会损害模块化,因为开发人员无法在不同时引入其他组件的情况下重用单个组件。当然,如果其他组件与其他组件耦合,架构就会越来越倾向于“大泥球”反模式。架构师如何在不不断监视冲动开发人员的情况下管理这种行为?代码审查有帮助,但在开发周期中发生得太晚,无法有效。如果架构师允许开发团队在代码审查之前肆意导入整个代码库一周,那么代码库中已经发生了严重损害。
The solution to this problem is to write a fitness function to look after cycles, as shown in Example 6-2. 解决这个问题的办法是编写一个适应度函数来处理循环,如示例 6-2 所示。
Example 6-2. Fitness function to detect component cycles 示例 6-2. 检测组件循环的适应度函数
In the code, an architect uses the metrics tool JDepend to check the dependencies between packages. The tool understands the structure of Java packages and fails the test if any cycles exist. An architect can wire this test into the continuous build on a project and stop worrying about the accidental introduction of cycles by triggerhappy developers. This is a great example of a fitness function guarding the important rather than urgent practices of software development: it’s an important concern for architects yet has little impact on day-to-day coding. 在代码中,架构师使用度量工具 JDepend 来检查包之间的依赖关系。该工具理解 Java 包的结构,如果存在任何循环,则测试失败。架构师可以将此测试集成到项目的持续构建中,从而不必担心冲动的开发人员意外引入循环。这是一个很好的例子,展示了一个适应性函数保护软件开发中重要而非紧急的实践:这是架构师的重要关注点,但对日常编码的影响很小。
Distance from the main sequence fitness function 主序适应度函数的距离
In “Coupling” on page 44, we introduced the more esoteric metric of distance from the main sequence, which architects can also verify using fitness functions, as shown in Example 6-3. 在第 44 页的“耦合”中,我们介绍了更为深奥的度量标准——与主序列的距离,架构师也可以使用适应度函数进行验证,如示例 6-3 所示。
Example 6-3. Distance from the main sequence fitness function 示例 6-3. 主序列适应度函数的距离
In the code, the architect uses JDepend to establish a threshold for acceptable values, failing the test if a class falls outside the range. 在代码中,架构师使用 JDepend 来建立可接受值的阈值,如果一个类超出范围则测试失败。
This is both an example of an objective measure for an architecture characteristic and the importance of collaboration between developers and architects when designing and implementing fitness functions. The intent is not for a group of architects to ascend to an ivory tower and develop esoteric fitness functions that developers cannot understand. 这既是架构特征的客观衡量标准的一个例子,也是开发人员与架构师在设计和实施适应性函数时合作的重要性。其意图并不是让一群架构师登上象牙塔,开发出开发人员无法理解的深奥适应性函数。
Architects must ensure that developers understand the purpose of the fitness function before imposing it on them. 架构师必须确保开发人员在强加适应度函数之前理解其目的。
The sophistication of fitness function tools has increased over the last few years, including some special purpose tools. One such tool is ArchUnit, a Java testing framework inspired by and using several parts of the JUnit ecosystem. ArchUnit provides a variety of predefined governance rules codified as unit tests and allows architects to write specific tests that address modularity. Consider the layered architecture illustrated in Figure 6-4. 近年来,适应度函数工具的复杂性有所增加,包括一些特定用途的工具。其中一个工具是 ArchUnit,这是一个受 JUnit 生态系统启发并使用其多个部分的 Java 测试框架。ArchUnit 提供了一系列预定义的治理规则,这些规则被编纂为单元测试,并允许架构师编写特定的测试以解决模块化问题。考虑图 6-4 中所示的分层架构。
Figure 6-4. Layered architecture 图 6-4. 分层架构
When designing a layered monolith such as the one in Figure 6-4, the architect defines the layers for good reason (motivations, trade-offs, and other aspects of the layered architecture are described in Chapter 10). However, how can the architect ensure that developers will respect those layers? Some developers may not understand the importance of the patterns, while others may adopt a “better to ask forgiveness than permission” attitude because of some overriding local concern such as performance. But allowing implementers to erode the reasons for the architecture hurts the long-term health of the architecture. 在设计如图 6-4 所示的分层单体时,架构师出于良好的理由定义了各层(分层架构的动机、权衡和其他方面在第 10 章中进行了描述)。然而,架构师如何确保开发人员会尊重这些层呢?一些开发人员可能不理解这些模式的重要性,而另一些开发人员可能因为某些优先的本地问题(如性能)而采取“请求原谅总比请求许可好”的态度。但是,允许实现者削弱架构的理由会损害架构的长期健康。
ArchUnit allows architects to address this problem via a fitness function, shown in Example 6-4. ArchUnit 允许架构师通过适应度函数来解决这个问题,如示例 6-4 所示。
Example 6-4. ArchUnit fitness function to govern layers 示例 6-4. ArchUnit 适应性函数以管理层次
In Example 6-4, the architect defines the desirable relationship between layers and writes a verification fitness function to govern it. 在示例 6-4 中,架构师定义了层之间的理想关系,并编写了一个验证适应度函数来管理它。
A similar tool in the .NET space, NetArchTest, allows similar tests for that platform; a layer verification in C# appears in Example 6-5. 在.NET 领域中,一个类似的工具 NetArchTest 允许对该平台进行类似的测试;C#中的层验证出现在示例 6-5 中。
Example 6-5. NetArchTest for layer dependencies 示例 6-5. NetArchTest 用于层依赖关系
// Classes in the presentation should not directly reference repositories
var result = Types.InCurrentDomain()
.That()
.ResideInNamespace("NetArchTest.SampleLibrary.Presentation")
.ShouldNot()
.HaveDependencyOn("NetArchTest.SampleLibrary.Data")
.GetResult()
.IsSuccessful;
Another example of fitness functions is Netflix’s Chaos Monkey and the attendant Simian Army. In particular, the Conformity, Security, and Janitor Monkeys exemplify this approach. The Conformity Monkey allows Netflix architects to define governance rules enforced by the monkey in production. For example, if the architects decided that each service should respond usefully to all RESTful verbs, they build that check into the Conformity Monkey. Similarly, the Security Monkey checks each service for well-known security defects, like ports that shouldn’t be active and configuration errors. Finally, the Janitor Monkey looks for instances that no other services route to anymore. Netflix has an evolutionary architecture, so developers routinely migrate to newer services, leaving old services running with no collaborators. Because services running on the cloud consume money, the Janitor Monkey looks for orphan services and disintegrates them out of production. 另一个适应度函数的例子是 Netflix 的 Chaos Monkey 和随之而来的 Simian Army。特别是,Conformity、Security 和 Janitor Monkeys 体现了这种方法。Conformity Monkey 允许 Netflix 的架构师定义在生产中由猴子强制执行的治理规则。例如,如果架构师决定每个服务应该对所有 RESTful 动词做出有用的响应,他们会将该检查构建到 Conformity Monkey 中。同样,Security Monkey 检查每个服务是否存在众所周知的安全缺陷,比如不应该处于活动状态的端口和配置错误。最后,Janitor Monkey 寻找不再有其他服务路由到的实例。Netflix 拥有一种进化架构,因此开发人员会定期迁移到更新的服务,留下旧服务在没有协作伙伴的情况下运行。由于在云上运行的服务会消耗资金,Janitor Monkey 会寻找孤立服务并将其从生产中剔除。
The Origin of the Simian Army 猿军的起源
When Netflix decided to move its operations to Amazon’s cloud, the architects worried over the fact that they no longer had control over operations-what happens if a defect appears operationally? To solve this problem, they spawned the discipline of Chaos Engineering with the original Chaos Monkey, and eventually the Simian Army. The Chaos Monkey simulated general chaos within the production environment to see how well their system would endure it. Latency was a problem with some AWS instances, thus the Chaos Monkey would simulate high latency (which was such a problem, they eventually created a specialized monkey for it, the Latency Monkey). Tools such as the Chaos Kong, which simulates an entire Amazon data center failure, helped Netflix avoid such outages when they occured for real. 当 Netflix 决定将其运营迁移到亚马逊的云时,架构师们担心他们不再控制运营——如果出现缺陷会发生什么?为了解决这个问题,他们创造了混沌工程这一学科,最初是混沌猴子,最终发展为猿猴军团。混沌猴子在生产环境中模拟一般混乱,以观察他们的系统能承受多好。一些 AWS 实例存在延迟问题,因此混沌猴子会模拟高延迟(这成为一个问题,他们最终为此创建了一个专门的猴子,延迟猴子)。像混沌大猩猩这样的工具,模拟整个亚马逊数据中心故障,帮助 Netflix 在真实发生故障时避免了这样的停机。
Chaos engineering offers an interesting new perspective on architecture: it’s not a question of if something will eventually break, but when. Anticipating those breakages and tests to prevent them makes systems much more robust. 混沌工程为架构提供了一个有趣的新视角:问题不在于某件事是否最终会崩溃,而在于何时会崩溃。预见这些崩溃并进行测试以防止它们发生,使系统变得更加稳健。
A few years ago, the influential book The Checklist Manifesto by Atul Gawande (Picador) described how professions such as airline pilots and surgeons use checklists (sometimes legally mandated). It’s not because those professionals don’t know their jobs or are forgetful. Rather, when professionals do a highly detailed job over and over, it becomes easy for details to slip by; a succinct checklist forms an effective reminder. This is the correct perspective on fitness functions-rather than a heavyweight governance mechanism, fitness functions provide a mechanism for architects to express important architectural principles and automatically verify them. Developers know that they shouldn’t release insecure code, but that priority competes with dozens or hundreds of other priorities for busy developers. Tools like the Security Monkey specifically, and fitness functions generally, allow architects to codify important governance checks into the substrate of the architecture. 几年前,阿图尔·高瓦德(Atul Gawande)所著的影响力书籍《清单宣言》(The Checklist Manifesto)描述了航空公司飞行员和外科医生等职业如何使用清单(有时是法律要求的)。这并不是因为这些专业人士不知道自己的工作或健忘。相反,当专业人士一遍又一遍地进行高度详细的工作时,细节很容易被忽视;简洁的清单形成了有效的提醒。这是对适应性函数的正确看法——适应性函数不是一种繁重的治理机制,而是为架构师提供了一种表达重要架构原则并自动验证它们的机制。开发人员知道他们不应该发布不安全的代码,但这个优先级与忙碌的开发人员的数十个或数百个其他优先级相竞争。像 Security Monkey 这样的工具,特别是适应性函数,允许架构师将重要的治理检查编码到架构的基础中。
Scope of Architecture Characteristics 架构特征的范围
A prevailing axiomatic assumption in the software architecture world had traditionally placed the scope of architecture characteristics at the system level. For example, when architects talk about scalability, they generally couch that discussion around the scalability of the entire system. That was a safe assumption a decade ago, when virtually all systems were monolithic. With the advent of modern engineering techniques and the architecture styles they enabled, such as microservices, the scope of architecture characteristics has narrowed considerably. This is a prime example of an axiom slowly becoming outdated as the software development ecosystem continues its relentless evolution. 在软件架构领域,一个普遍的公理假设传统上将架构特性的范围置于系统级别。例如,当架构师谈论可扩展性时,他们通常会围绕整个系统的可扩展性进行讨论。十年前,这是一种安全的假设,那时几乎所有系统都是单体的。随着现代工程技术及其所启用的架构风格(如微服务)的出现,架构特性的范围大大缩小。这是一个公理随着软件开发生态系统不断演变而逐渐过时的典型例子。
During the writing of the Building Evolutionary Architectures book, the authors needed a technique to measure the structural evolvability of particular architecture styles. None of the existing measures offered the correct level of detail. In “Structural Measures” on page 79, we discuss a variety of code-level metrics that allow architects to analyze structural aspects of an architecture. However, all these metrics only reveal low-level details about the code, and cannot evaluate dependent components (such as databases) outside the code base that still impact many architecture characteristics, especially operational ones. For example, no matter how much an architect puts effort into designing a performant or elastic code base, if the system uses a database that doesn’t match those characteristics, the application won’t be successful. 在撰写《构建进化架构》一书时,作者需要一种技术来衡量特定架构风格的结构可演化性。现有的度量方法都没有提供正确的细节级别。在第 79 页的“结构度量”中,我们讨论了一系列代码级别的度量标准,这些标准允许架构师分析架构的结构方面。然而,所有这些度量标准仅揭示了代码的低级细节,无法评估代码库之外的依赖组件(如数据库),而这些组件仍然会影响许多架构特性,尤其是操作特性。例如,无论架构师在设计高性能或弹性代码库上投入多少努力,如果系统使用的数据库与这些特性不匹配,应用程序就不会成功。
When evaluating many operational architecture characteristics, an architect must consider dependent components outside the code base that will impact those characteristics. Thus, architects need another method to measure these kinds of dependencies. That lead the Building Evolutionary Architectures authors to define the term architecture quantum. To understand the architecture quantum definition, we must preview one key metric here, connascence. 在评估许多操作架构特性时,架构师必须考虑代码库之外的依赖组件,这些组件将影响这些特性。因此,架构师需要另一种方法来衡量这些依赖关系。这使得《构建进化架构》的作者定义了术语架构量子。要理解架构量子的定义,我们必须在这里预览一个关键指标,即共生性。
Coupling and Connascence 耦合与共生性
Many of the code-level coupling metrics, such as afferent and efferent coupling (described in “Structural Measures” on page 79), reveal details at a too fine-grained level for architectural analysis. In 1996, Meilir Page-Jones published a book titled What Every Programmer Should Know About Object Oriented Design (Dorset House) that included several new measures of coupling he named connascence, which is defined as follows: 许多代码级耦合度量,例如输入耦合和输出耦合(在第 79 页的“结构度量”中描述),在架构分析中揭示的细节过于细致。1996 年,Meilir Page-Jones 出版了一本名为《What Every Programmer Should Know About Object Oriented Design》(Dorset House)的书,其中包括他命名为 connascence 的几种新的耦合度量,定义如下:
Connascence 共生性
Two components are connascent if a change in one would require the other to be modified in order to maintain the overall correctness of the system 如果一个组件的变化需要另一个组件进行修改以保持系统的整体正确性,则这两个组件是共生的
He defined two types of connascence: static, discoverable via static code analysis, and dynamic, concerning runtime behavior. To define the architecture quantum, we needed a measure of how components are “wired” together, which corresponds to the connascence concept. For example, if two services in a microservices architecture share the same class definition of some class, like address, we say they are statically connascent with each other-changing the shared class requires changes to both services. 他定义了两种类型的共生性:静态的,可以通过静态代码分析发现,以及动态的,涉及运行时行为。为了定义架构量子,我们需要一种衡量组件如何“连接”在一起的方法,这与共生性概念相对应。例如,如果微服务架构中的两个服务共享某个类的相同类定义,比如地址,我们说它们是静态共生的——更改共享类需要对两个服务都进行更改。
For dynamic connascence, we define two types: synchronous and asynchronous. Synchronous calls between two distributed services have the caller wait for the response from the callee. On the other hand, asynchronous calls allow fire-and-forget semantics in event-driven architectures, allowing two different services to differ in operational architecture 对于动态共生性,我们定义了两种类型:同步和异步。两个分布式服务之间的同步调用使调用者等待被调用者的响应。另一方面,异步调用允许在事件驱动架构中使用“发射并忘记”的语义,使两个不同的服务在操作架构上有所不同。
Architectural Quanta and Granularity 架构量子与粒度
Component-level coupling isn’t the only thing that binds software together. Many business concepts semantically bind parts of the system together, creating functional cohesion. To successfully design, analyze, and evolve software, developers must consider all the coupling points that could break. 组件级耦合并不是将软件绑定在一起的唯一因素。许多业务概念在语义上将系统的各个部分绑定在一起,形成功能凝聚力。为了成功设计、分析和演进软件,开发人员必须考虑所有可能会断开的耦合点。
Many science-literate developers know of the concept of quantum from physics, the minimum amount of any physical entity involved in an interaction. The word quantum derives from Latin, meaning “how great” or “how much.” We have adopted this notion to define an architecture quantum: 许多科学素养的开发者都知道量子这一物理概念,它是参与交互的任何物理实体的最小量。量子一词源自拉丁语,意为“多大”或“多少”。我们采用这个概念来定义架构量子:
Architecture quantum 架构量子
An independently deployable artifact with high functional cohesion and synchronous connascence 一个独立可部署的工件,具有高功能内聚性和同步共生性
This definition contains several parts, dissected here: 这个定义包含几个部分,下面进行剖析:
Independently deployable 独立可部署
An architecture quantum includes all the necessary components to function independently from other parts of the architecture. For example, if an application uses a database, it is part of the quantum because the system won’t function without it. This requirement means that virtually all legacy systems deployed using a single database by definition form a quantum of one. However, in the microservices architecture style, each service includes its own database (part of the bounded context driving philosophy in microservices, described in detail in Chapter 17), creating multiple quanta within that architecture. 一个架构量子包括所有必要的组件,以便独立于架构的其他部分运行。例如,如果一个应用程序使用数据库,它就是量子的一部分,因为系统没有它就无法运行。这一要求意味着,几乎所有使用单一数据库部署的遗留系统根据定义形成一个量子。然而,在微服务架构风格中,每个服务都包括自己的数据库(这是微服务中驱动哲学的有界上下文的一部分,在第 17 章中详细描述),在该架构中创建多个量子。
High functional cohesion 高功能内聚性
Cohesion in component design refers to how well the contained code is unified in purpose. For example, a Customer component with properties and methods all pertaining to a Customer entity exhibits high cohesion; whereas a Utility component with a random collection of miscellaneous methods would not. 组件设计中的内聚性指的是所包含代码在目的上的统一程度。例如,一个包含与客户实体相关的属性和方法的客户组件表现出高内聚性;而一个随机收集各种杂项方法的工具组件则不会。
High functional cohesion implies that an architecture quantum does something purposeful. This distinction matters little in traditional monolithic applications with a single database. However, in microservices architectures, developers typically design each service to match a single workflow (a bounded context, as described in “Domain-Driven Design’s Bounded Context” on page 94), thus exhibiting high functional cohesion. 高功能内聚性意味着架构量子执行某种有目的的操作。在传统的单体应用程序中,这种区别几乎没有意义,因为它们通常只有一个数据库。然而,在微服务架构中,开发人员通常会设计每个服务以匹配单一工作流(如“领域驱动设计的界限上下文”第 94 页所述),从而表现出高功能内聚性。
Synchronous connascence 同步共生
Synchronous connascence implies synchronous calls within an application context or between distributed services that form this architecture quantum. For example, if one service in a microservices architecture calls another one synchronously, each service cannot exhibit extreme differences in operational architecture characteristics. If the caller is much more scalable than the callee, timeouts and other reliability concerns will occur. Thus, synchronous calls create dynamic connascence for the length of the call-if one is waiting for the other, their operational architecture characteristics must be the same for the duration of the call. 同步共生意味着在应用程序上下文中或在构成此架构量子之间的分布式服务中进行同步调用。例如,如果微服务架构中的一个服务同步调用另一个服务,则每个服务在操作架构特征上不能表现出极大的差异。如果调用者的可扩展性远高于被调用者,则会出现超时和其他可靠性问题。因此,同步调用在调用期间会创建动态共生——如果一个在等待另一个,它们的操作架构特征在调用期间必须相同。
Back in Chapter 6, we defined the relationship between traditional coupling metrics and connascence, which didn’t include our new communication connascence measure. We update this diagram in Figure 7-1. 在第 6 章中,我们定义了传统耦合度量与共生性之间的关系,但没有包括我们新的通信共生性度量。我们在图 7-1 中更新了这个图表。
Figure 7-1. Adding quantum connascence to the unified diagram 图 7-1. 将量子共生添加到统一图中
For another example, consider a microservices architecture with a Payment service and an Auction service. When an auction ends, the Auction service sends payment information to the Payment service. However, let’s say that the payment service can only handle a payment every 500 ms -what happens when a large number of auctions end at once? A poorly designed architecture would allow the first call to go through and allow the others to time out. Alternatively, an architect might design an asynchronous communication link between Payment and Auction, allowing the message queue to temporarily buffer differences. In this case, asynchronous connascence creates a more flexible architecture. We cover this subject in great detail in Chapter 14. 另一个例子是考虑一个包含支付服务和拍卖服务的微服务架构。当拍卖结束时,拍卖服务将支付信息发送给支付服务。然而,假设支付服务每 500 毫秒只能处理一次支付——当大量拍卖同时结束时会发生什么?一个设计不良的架构将允许第一次调用通过,而让其他调用超时。或者,架构师可能会设计一个支付和拍卖之间的异步通信链接,允许消息队列暂时缓冲差异。在这种情况下,异步共生创造了一个更灵活的架构。我们在第 14 章中对此进行了详细讨论。
Eric Evans’ book Domain-Driven Design (Addison-Wesley Professional) has deeply influenced modern architectural thinking. Domain-driven design (DDD) is a modeling technique that allows for organized decomposition of complex problem domains. DDD defines the bounded context, where everything related to the domain is visible internally but opaque to other bounded contexts. Before DDD, developers sought holistic reuse across common entities within the organization. Yet creating common shared artifacts causes a host of problems, such as coupling, more difficult coordination, and increased complexity. The bounded context concept recognizes that each entity works best within a localized context. Thus, instead of creating a unified Cus tomer class across the entire organization, each problem domain can create its own and reconcile differences at integration points. 埃里克·埃文斯的书《领域驱动设计》(Addison-Wesley Professional)深刻影响了现代架构思维。领域驱动设计(DDD)是一种建模技术,允许对复杂问题领域进行有组织的分解。DDD 定义了界限上下文,其中与领域相关的所有内容在内部可见,但对其他界限上下文是不可见的。在 DDD 之前,开发人员寻求在组织内的共同实体之间进行整体重用。然而,创建共同的共享工件会导致一系列问题,例如耦合、协调更困难和复杂性增加。界限上下文的概念认识到每个实体在局部上下文中工作得最好。因此,与其在整个组织中创建一个统一的 Customer 类,不如让每个问题领域创建自己的类,并在集成点上调和差异。
The architecture quantum concept provides the new scope for architecture characteristics. In modern systems, architects define architecture characteristics at the quantum level rather than system level. By looking at a narrower scope for important operational concerns, architects may identify architectural challenges early, leading to hybrid architectures. To illustrate scoping provided by the architecture quantum measure, consider another architecture kata, Going, Going, Gone. 架构量子概念为架构特性提供了新的范围。在现代系统中,架构师在量子层面而非系统层面定义架构特性。通过关注重要操作问题的狭窄范围,架构师可以及早识别架构挑战,从而导致混合架构。为了说明架构量子度量提供的范围,可以考虑另一个架构练习,Going, Going, Gone。
Case Study: Going, Going, Gone 案例研究:去,去,去
In Chapter 5, we introduced the concept of an architecture kata. Consider this one, concerning an online auction company. Here is the description of the architecture kata: 在第五章中,我们介绍了架构练习的概念。考虑这个,关于一个在线拍卖公司的。以下是架构练习的描述:
Description 描述
An auction company wants to take its auctions online to a nationwide scale. Customers choose the auction to participate in, wait until the auction begins, then bid as if they are there in the room with the auctioneer. 一家拍卖公司希望将其拍卖活动在线上扩展到全国范围。客户选择要参与的拍卖,等待拍卖开始,然后像在拍卖师所在的房间里一样出价。
Users 用户
Scale up to hundreds of participants per auction, potentially up to thousands of participants, and as many simultaneous auctions as possible. 扩展到每个拍卖数百名参与者,可能达到数千名参与者,并尽可能多地进行同时拍卖。
Requirements 需求
Auctions must be as real-time as possible. 拍卖必须尽可能实时。
Bidders register with a credit card; the system automatically charges the card if the bidder wins. 投标者使用信用卡注册;如果投标者获胜,系统会自动扣款。
Participants must be tracked via a reputation index. 参与者必须通过声誉指数进行跟踪。
Bidders can see a live video stream of the auction and all bids as they occur. 投标者可以看到拍卖的实时视频流以及所有实时出价。
Both online and live bids must be received in the order in which they are placed. 所有在线和现场投标必须按照提交的顺序接收。
Additional context 额外的上下文
Auction company is expanding aggressively by merging with smaller competitors. 拍卖公司通过与较小的竞争对手合并而积极扩张。
Budget is not constrained. This is a strategic direction. 预算没有限制。这是一个战略方向。
Company just exited a lawsuit where it settled a suit alleging fraud. 公司刚刚结束了一场诉讼,达成了对一起指控欺诈的诉讼的和解。
Just as in “Case Study: Silicon Sandwiches” on page 69, an architect must consider each of these requirements to ascertain architecture characteristics: 正如在第 69 页的“案例研究:硅三明治”中所述,架构师必须考虑每一个这些要求,以确定架构特征:
“Nationwide scale,” “scale up to hundreds of participants per auction, potentially up to thousands of participants, and as many simultaneous auctions as possible,” “auctions must be as real-time as possible.” “全国范围”,“每次拍卖规模可达到数百名参与者,潜在可达到数千名参与者,以及尽可能多的同时拍卖”,“拍卖必须尽可能实时。”
Each of these requirements implies both scalability to support the sheer number of users and elasticity to support the bursty nature of auctions. While the requirements explicitly call out scalability, elasticity represents an implicit characteristics based on the problem domain. When considering auctions, do users all politely spread themselves out during the course of bidding, or do they become more frantic near the end? Domain knowledge is crucial for architects to pick up implicit architecture characteristics. Given the real-time nature of auctions, an architect will certainly consider performance a key architecture characteristic. 这些要求都意味着需要可扩展性以支持大量用户,以及弹性以支持拍卖的突发性质。虽然要求明确提到了可扩展性,但弹性则是基于问题领域的隐含特征。在考虑拍卖时,用户在竞标过程中是否都礼貌地分散开,还是在最后时刻变得更加急切?领域知识对于架构师识别隐含的架构特征至关重要。鉴于拍卖的实时性质,架构师肯定会将性能视为一个关键的架构特征。
2. “Bidders register with a credit card; the system automatically charges the card if the bidder wins,” “company just exited a lawsuit where it settled a suit alleging fraud.” 2. “投标者使用信用卡注册;如果投标者获胜,系统会自动扣款,” “公司刚刚结束了一场诉讼,和解了一起指控欺诈的诉讼。”
Both these requirements clearly point to security as an architecture characteristic. As covered in Chapter 5, security is an implicit architecture characteristic in virtually every application. Thus, architects rely on the second part of the definition of architecture characteristics, that they influence some structural aspect of the design. Should an architect design something special to accommodate security, or will general design and coding hygiene suffice? Architects have developed techniques for handling credit cards safely via design without necessarily building special structure. For example, as long as developers make sure not to store credit card numbers in plain text, to encrypt while in transit, and so on, then the architect shouldn’t have to build special considerations for security. 这两个要求明确指出安全性作为一种架构特征。如第 5 章所述,安全性在几乎每个应用中都是一种隐含的架构特征。因此,架构师依赖于架构特征定义的第二部分,即它们影响设计的某些结构方面。架构师是否应该设计一些特别的东西来适应安全性,还是一般的设计和编码规范就足够了?架构师已经开发出通过设计安全处理信用卡的技术,而不一定需要构建特殊的结构。例如,只要开发人员确保不以明文存储信用卡号码,在传输过程中进行加密等等,那么架构师就不必为安全性构建特殊的考虑。
However, the second phrase should make an architect pause and ask for further clarification. Clearly, some aspect of security (fraud) was a problem in the past, thus the architect should ask for further input no matter what level of security they design. 然而,第二个短语应该让架构师停下来并要求进一步澄清。显然,过去某些方面的安全性(欺诈)是一个问题,因此无论他们设计什么级别的安全性,架构师都应该要求进一步的意见。
3. “Participants must be tracked via a reputation index.” 3. “参与者必须通过声誉指数进行跟踪。”
This requirement suggests some fanciful names such as “anti-trollability,” but the track part of the requirement might suggest some architecture characteristics such as auditability and loggability. The deciding factor again goes back to the defining characteristic-is this outside the scope of the problem domain? Architects must remember that the analysis to yield architecture characteristics represents only a small part of the overall effort to design and implement an application-a lot of design work happens past this phase! During this part of architecture definition, architects look for requirements with structural impact not already covered by the domain. 这个要求提出了一些奇特的名称,比如“反巨魔性”,但要求的跟踪部分可能暗示了一些架构特性,如可审计性和可记录性。决定性因素再次回到定义特征——这是否超出了问题域的范围?架构师必须记住,得出架构特性的分析仅代表设计和实现应用程序的整体工作的一小部分——很多设计工作发生在这个阶段之后!在架构定义的这一部分,架构师寻找对结构有影响的需求,而这些需求尚未被领域覆盖。
Here’s a useful litmus test architects use to make the determination between domain versus architecture characteristics is: does it require domain knowledge to implement, or is it an abstract architecture characteristic? In the Going, Going, Gone kata, an architect upon encountering the phrase “reputation index” would 这里有一个建筑师用来区分领域特征与架构特征的有用试金石:实现它是否需要领域知识,还是它是一个抽象的架构特征?在“Going, Going, Gone” kata 中,建筑师在遇到“reputation index”这个短语时会
seek out a business analyst or other subject matter expert to explain what they had in mind. In other words, the phrase “reputation index” isn’t a standard definition like more common architecture characteristics. As a counter example, when architects discuss elasticity, the ability to handle bursts of users, they can talk about the architecture characteristic purely in the abstract-it doesn’t matter what kind of application they consider: banking, catalog site, streaming video, and so on. Architects must determine whether a requirement isn’t already encompassed by the domain and requires particular structure, which elevates a consideration to architecture characteristic. 寻找业务分析师或其他主题专家来解释他们的想法。换句话说,“声誉指数”这个短语并不是像更常见的架构特征那样的标准定义。作为反例,当架构师讨论弹性,即处理用户激增的能力时,他们可以纯粹在抽象层面上讨论架构特征——无论他们考虑什么类型的应用程序:银行、目录网站、流媒体视频等等,都是无关紧要的。架构师必须确定一个需求是否已经被领域所涵盖,并且是否需要特定的结构,这将一个考虑提升为架构特征。
4. “Auction company is expanding aggressively by merging with smaller competitors.” 4. “拍卖公司通过与较小的竞争对手合并而积极扩张。”
While this requirement may not have an immediate impact on application design, it might become the determining factor in a trade-off between several options. For example, architects must often choose details such as communication protocols for integration architecture: if integration with newly merged companies isn’t a concern, it frees the architect to choose something highly specific to the problem. On the other hand, an architect may choose something that’s less than perfect to accommodate some additional trade-off, such as interoperability. Subtle implicit architecture characteristics such as this pervade architecture, illustrating why doing the job well presents challenges. 虽然这个要求可能对应用程序设计没有直接影响,但它可能成为多个选项之间权衡的决定性因素。例如,架构师通常必须选择集成架构的细节,例如通信协议:如果与新合并公司的集成不是一个问题,这就使架构师可以选择一些非常具体的问题解决方案。另一方面,架构师可能会选择一些不那么完美的方案,以适应一些额外的权衡,例如互操作性。这种微妙的隐含架构特征充斥着架构,说明为什么做好这项工作会面临挑战。
5. “Budget is not constrained. This is a strategic direction.” 5. “预算没有限制。这是一个战略方向。”
Some architecture katas impose budget restrictions on the solution to represent a common real-world trade-off. However, in the Going, Going, Gone kata, it does not. This allows the architect to choose more elaborate and/or special-purpose architectures, which will be beneficial given the next requirements. 一些架构练习对解决方案施加预算限制,以代表一个常见的现实世界权衡。然而,在“Going, Going, Gone”练习中并没有这种限制。这使得架构师可以选择更复杂和/或特殊用途的架构,这在满足下一个需求时将是有益的。
6. “Bidders can see a live video stream of the auction and all bids as they occur,” “both online and live bids must be received in the order in which they are placed.” 6. “投标者可以看到拍卖的实时视频流和所有投标情况,” “在线和现场投标必须按照其提交的顺序接收。”
This requirement presents an interesting architectural challenge, definitely impacting the structure of the application and exposing the futility of treating architecture characteristics as a system-wide evaluation. Consider availability-is that need uniform throughout the architecture? In other words, is the availability of the one bidder more important than availability for one of the hundreds of bidders? Obviously, the architect desires good measures for both, but one is clearly more critical: if the auctioneer cannot access the site, online bids cannot occur for anyone. Reliability commonly appears with availability; it addresses operational aspects such as uptime, as well as data integrity and other measures of how reliable an application is. For example, in an auction site, the architect must ensure that the message ordering is reliably correct, eliminating race conditions and other problems. 这个需求提出了一个有趣的架构挑战,肯定会影响应用程序的结构,并暴露出将架构特性视为系统范围评估的无效性。考虑可用性——这种需求在整个架构中是否一致?换句话说,一个投标者的可用性是否比数百个投标者中的一个的可用性更重要?显然,架构师希望两者都有良好的度量,但其中一个显然更为关键:如果拍卖师无法访问网站,任何人的在线竞标都无法进行。可靠性通常与可用性一起出现;它涉及操作方面,例如正常运行时间,以及数据完整性和其他衡量应用程序可靠性的方法。例如,在一个拍卖网站中,架构师必须确保消息顺序可靠正确,消除竞争条件和其他问题。
This last requirement in the Going, Going, Gone kata highlights the need for a more granular scope in architecture than the system level. Using the architecture quantum measure, architects scope architecture characteristics at the quantum level. For example, in Going, Going, Gone, an architect would notice that different parts of this architecture need different characteristics: streaming bids, online bidders, and the auctioneer are three obvious choices. Architects use the architecture quantum measure as a way to think about deployment, coupling, where data should reside, and communication styles within architectures. In this kata, an architect can analyze the differing architecture characteristics per architecture quantum, leading to hybrid architecture design earlier in the process. 在“Going, Going, Gone” kata 中的最后一个要求强调了在架构中需要比系统级别更细粒度的范围。使用架构量子度量,架构师在量子级别上确定架构特性。例如,在“Going, Going, Gone”中,架构师会注意到该架构的不同部分需要不同的特性:流式竞标、在线竞标者和拍卖师是三个明显的选择。架构师使用架构量子度量作为思考部署、耦合、数据应存放的位置以及架构内通信风格的一种方式。在这个 kata 中,架构师可以分析每个架构量子所具有的不同架构特性,从而在早期阶段实现混合架构设计。
Thus, for Going, Going, Gone, we identified the following quanta and corresponding architecture characteristics: 因此,对于《Going, Going, Gone》,我们确定了以下量子及其相应的架构特征:
Bidder feedback 投标人反馈
Encompasses the bid stream and video stream of bids 涵盖了投标的投标流和视频流
Availability 可用性
Scalability 可扩展性
Performance 性能
Auctioneer 拍卖师
The live auctioneer 现场拍卖师
Availability 可用性
Reliability 可靠性
Scalability 可扩展性
Elasticity 弹性
Performance 性能
Security 安全
Bidder 投标人
Online bidders and bidding 在线竞标者和竞标
Reliability 可靠性
Availability 可用性
Scalability 可扩展性
Elasticity 弹性
Component-Based Thinking 基于组件的思维
In Chapter 3, we discussed modules as a collection of related code. However, architects typically think in terms of components, the physical manifestation of a module. 在第三章中,我们讨论了模块作为相关代码的集合。然而,架构师通常以组件的形式思考,组件是模块的物理表现。
Developers physically package modules in different ways, sometimes depending on their development platform. We call physical packaging of modules components. Most languages support physical packaging as well: jar files in Java, dll in .NET, gem in Ruby, and so on. In this chapter, we discuss architectural considerations around components, ranging from scope to discovery. 开发人员以不同的方式物理打包模块,有时取决于他们的开发平台。我们将模块的物理打包称为组件。大多数语言也支持物理打包:Java 中的 jar 文件,.NET 中的 dll,Ruby 中的 gem,等等。在本章中,我们讨论与组件相关的架构考虑,从范围到发现。
Component Scope 组件范围
Developers find it useful to subdivide the concept of component based on a wide host of factors, a few of which appear in Figure 8-1. 开发人员发现根据多种因素细分组件的概念是有用的,其中一些因素出现在图 8-1 中。
Components offer a language-specific mechanism to group artifacts together, often nesting them to create stratification. As shown in Figure 8-1, the simplest component wraps code at a higher level of modularity than classes (or functions, in nonobjectoriented languages). This simple wrapper is often called a library, which tends to run in the same memory address as the calling code and communicate via language function call mechanisms. Libraries are usually compile-time dependencies (with notable exceptions like dynamic link libraries [DLLs] that were the bane of Windows users for many years). 组件提供了一种特定于语言的机制来将工件组合在一起,通常通过嵌套它们来创建分层。如图 8-1 所示,最简单的组件以比类(或在非面向对象语言中的函数)更高的模块化级别包装代码。这个简单的包装器通常被称为库,它往往与调用代码在同一内存地址中运行,并通过语言函数调用机制进行通信。库通常是编译时依赖项(有一些显著的例外,如动态链接库 [DLLs],多年来一直是 Windows 用户的痛苦)。
Figure 8-1. Different varieties of components 图 8-1. 不同种类的组件
Components also appear as subsystems or layers in architecture, as the deployable unit of work for many event processors. Another type of component, a service, tends to run in its own address space and communicates via low-level networking protocols like TCP/IP or higher-level formats like REST or message queues, forming standalone, deployable units in architectures like microservices. 组件在架构中也作为子系统或层出现,作为许多事件处理器的可部署工作单元。另一种类型的组件,服务,倾向于在其自己的地址空间中运行,并通过低级网络协议如 TCP/IP 或更高级的格式如 REST 或消息队列进行通信,形成在微服务等架构中的独立可部署单元。
Nothing requires an architect to use components-it just so happens that it’s often useful to have a higher level of modularity than the lowest level offered by the language. For example, in microservices architectures, simplicity is one of the architectural principles. Thus, a service may consist of enough code to warrant components or may be simple enough to just contain a small bit of code, as illustrated in Figure 8-2. 没有什么要求架构师使用组件——只是恰好在许多情况下,拥有比语言提供的最低级别更高的模块化是有用的。例如,在微服务架构中,简单性是一个架构原则。因此,一个服务可能包含足够的代码以 warrant 组件,或者可能简单到只包含一小段代码,如图 8-2 所示。
Components form the fundamental modular building block in architecture, making them a critical consideration for architects. In fact, one of the primary decisions an architect must make concerns the top-level partitioning of components in the architecture. 组件是架构中基本的模块化构建块,因此它们是架构师必须考虑的关键因素。实际上,架构师必须做出的主要决策之一涉及架构中组件的顶层划分。
Figure 8-2. A microservice might have so little code that components aren’t necessary 图 8-2. 一个微服务可能代码量非常少,以至于不需要组件
Architect Role 架构师角色
Typically, the architect defines, refines, manages, and governs components within an architecture. Software architects, in collaboration with business analysts, subject matter experts, developers, QA engineers, operations, and enterprise architects, create the initial design for software, incorporating the architecture characteristics discussed in Chapter 4 and the requirements for the software system. 通常,架构师定义、细化、管理和治理架构中的组件。软件架构师与业务分析师、主题专家、开发人员、质量保证工程师、运营人员和企业架构师合作,创建软件的初始设计,结合第 4 章讨论的架构特性和软件系统的需求。
Virtually all the details we cover in this book exist independently from whatever software development process teams use: architecture is independent from the development process. The primary exception to this rule entails the engineering practices pioneered in the various flavors of Agile software development, particularly in the areas of deployment and automating governance. However, in general, software architecture exists separate from the process. Thus, architects ultimately don’t care where requirements originate: a formal Joint Application Design (JAD) process, lengthy waterfall-style analysis and design, Agile story cards…or any hybrid variation of those. 本书中涵盖的几乎所有细节都独立于团队使用的任何软件开发过程:架构与开发过程是独立的。这个规则的主要例外涉及在各种敏捷软件开发形式中开创的工程实践,特别是在部署和自动化治理方面。然而,通常情况下,软件架构是与过程分开的。因此,架构师最终并不关心需求的来源:正式的联合应用设计(JAD)过程、冗长的瀑布式分析和设计、敏捷故事卡……或这些的任何混合变体。
Generally the component is the lowest level of the software system an architect interacts directly with, with the exception of many of the code quality metrics discussed in Chapter 6 that affect code bases holistically. Components consist of classes or functions (depending on the implementation platform), whose design falls under the responsibility of tech leads or developers. It’s not that architects shouldn’t involve themselves in class design (particularly when discovering or applying design patterns), but they should avoid micromanaging each decision from top to bottom in the system. If architects never allow other roles to make decisions of consequence, the organization will struggle with empowering the next generation of architects. 通常,组件是架构师直接交互的最低级别的软件系统,除了第 6 章中讨论的许多影响代码库整体的代码质量指标。组件由类或函数组成(取决于实现平台),其设计由技术负责人或开发人员负责。并不是说架构师不应该参与类设计(特别是在发现或应用设计模式时),但他们应该避免从系统的顶部到底部微观管理每一个决策。如果架构师从不允许其他角色做出重要决策,组织将难以赋权下一代架构师。
An architect must identify components as one of the first tasks on a new project. But before an architect can identify components, they must know how to partition the architecture. 架构师必须在新项目的第一项任务中识别组件。但在架构师能够识别组件之前,他们必须知道如何划分架构。
Architecture Partitioning 架构分区
The First Law of Software Architecture states that everything in software is a tradeoff, including how architects create components in an architecture. Because components represent a general containership mechanism, an architect can build any type of partitioning they want. Several common styles exist, with different sets of tradeoffs. We discuss architecture styles in depth in Part II. Here we discuss an important aspect of styles, the top-level partitioning in an architecture. 软件架构的第一法则指出,软件中的一切都是权衡,包括架构师如何在架构中创建组件。由于组件代表了一种通用的容器机制,架构师可以构建他们想要的任何类型的分区。存在几种常见的风格,具有不同的权衡集合。我们在第二部分深入讨论架构风格。在这里,我们讨论风格的一个重要方面,即架构中的顶级分区。
Consider the two types of architecture styles shown in Figure 8-3. 考虑图 8-3 中显示的两种架构风格。
Figure 8-3. Two types of top-level architecture partitioning: layered and modular 图 8-3. 两种类型的顶层架构分区:分层和模块化
In Figure 8-3, one type of architecture familiar to many is the layered monolith (discussed in detail in Chapter 10). The other is an architecture style popularized by Simon Brown called a modular monolith, a single deployment unit associated with a database and partitioned around domains rather than technical capabilities. These two styles represent different ways to top-level partition the architecture. Note that in each variation, each of the top-level components (layers or components) likely has other components embedded within. The top-level partitioning is of particular interest to architects because it defines the fundamental architecture style and way of partitioning code. 在图 8-3 中,许多人熟悉的一种架构是分层单体(在第 10 章中详细讨论)。另一种是由西蒙·布朗推广的架构风格,称为模块化单体,它是一个与数据库相关的单一部署单元,围绕领域而不是技术能力进行分区。这两种风格代表了对架构进行顶层分区的不同方式。请注意,在每种变体中,每个顶层组件(层或组件)可能都有其他嵌入的组件。顶层分区对架构师特别重要,因为它定义了基本的架构风格和代码分区的方式。
Organizing architecture based on technical capabilities like the layered monolith represents technical top-level partitioning. A common version of this appears in Figure 8-4. 基于技术能力(如分层单体)组织架构代表了技术顶级分区。这种常见版本出现在图 8-4 中。
Figure 8-4. Two types of top-level partitioning in architecture 图 8-4. 架构中的两种顶级分区类型
In Figure 8-4, the architect has partitioned the functionality of the system into technical capabilities: presentation, business rules, services, persistence, and so on. This way of organizing a code base certainly makes sense. All the persistence code resides in one layer in the architecture, making it easy for developers to find persistence-related code. Even though the basic concept of layered architecture predates it by decades, the Model-View-Controller design pattern matches with this architectural pattern, making it easy for developers to understand. Thus, it is often the default architecture in many organizations. 在图 8-4 中,架构师将系统的功能划分为技术能力:表现、业务规则、服务、持久性等。这种组织代码库的方式无疑是合理的。所有持久性代码都位于架构的一个层中,使开发人员能够轻松找到与持久性相关的代码。尽管分层架构的基本概念早在几十年前就已存在,但模型-视图-控制器设计模式与这种架构模式相匹配,使开发人员易于理解。因此,它通常是许多组织的默认架构。
An interesting side effect of the predominance of the layered architecture relates to how companies seat different project roles. When using a layered architecture, it makes some sense to have all the backend developers sit together in one department, the DBAs in another, the presentation team in another, and so on. Because of Conway’s law, this makes some sense in those organizations. 分层架构占主导地位的一个有趣副作用与公司如何安排不同项目角色有关。当使用分层架构时,将所有后端开发人员安排在一个部门、数据库管理员在另一个部门、展示团队在另一个部门等是有一定道理的。由于康威定律,这在这些组织中是有一定道理的。
Conway's Law 康威定律
Back in the late 1960s, Melvin Conway made an observation that has become known as Conway’s law: 在 1960 年代末,梅尔文·康威提出了一个观察,这个观察被称为康威定律:
Organizations which design systems … are constrained to produce designs which are copies of the communication structures of these organizations. 设计系统的组织……受到限制,只能生成与这些组织的沟通结构相似的设计。
Paraphrased, this law suggests that when a group of people designs some technical artifact, the communication structures between the people end up replicated in the 换句话说,这条法则表明,当一群人设计某个技术工件时,人与人之间的沟通结构最终会在
design. People at all levels of organizations see this law in action, and they sometimes make decisions based on it. For example, it is common for organizations to partition workers based on technical capabilities, which makes sense from a pure organizational sense but hampers collaboration because of artificial separation of common concerns. 设计。组织中各个层级的人都能看到这一法则的实际应用,他们有时会基于此做出决策。例如,组织通常会根据技术能力对员工进行分区,这在纯粹的组织意义上是合理的,但由于对共同关注点的人工分隔,阻碍了协作。
A related observation coined by Jonny Leroy of ThoughtWorks is the Inverse Conway Maneuver, which suggests evolving team and organizational structure together to promote the desired architecture. 由 ThoughtWorks 的 Jonny Leroy 提出的一个相关观察是逆康威操作(Inverse Conway Maneuver),它建议团队和组织结构应共同演变,以促进所需的架构。
The other architectural variation in Figure 8-4 represents domain partitioning, inspired by the Eric Evan book Domain-Driven Design, which is a modeling technique for decomposing complex software systems. In DDD, the architect identifies domains or workflows independent and decoupled from each other. The microservices architecture style (discussed in Chapter 17) is based on this philosophy. In a modular monolith, the architect partitions the architecture around domains or workflows rather than technical capabilities. As components often nest within one another, each of the components in Figure 8-4 in the domain partitioning (for example, CatalogCheckout) may use a persistence library and have a separate layer for business rules, but the top-level partitioning revolves around domains. 图 8-4 中的另一个架构变体代表了领域划分,灵感来自 Eric Evan 的书《领域驱动设计》,这是一种用于分解复杂软件系统的建模技术。在 DDD 中,架构师识别出相互独立且解耦的领域或工作流。微服务架构风格(在第 17 章中讨论)基于这一理念。在模块化单体中,架构师围绕领域或工作流而不是技术能力对架构进行划分。由于组件通常相互嵌套,图 8-4 中领域划分的每个组件(例如,CatalogCheckout)可能使用持久性库并具有单独的业务规则层,但顶层划分围绕领域进行。
One of the fundamental distinctions between different architecture patterns is what type of top-level partitioning each supports, which we cover for each individual pattern. It also has a huge impact on how an architect decides how to initially identify components-does the architect want to partition things technically or by domain? 不同架构模式之间的一个基本区别是它们支持哪种类型的顶级分区,我们将为每个单独的模式进行讨论。这对架构师决定如何最初识别组件有着巨大的影响——架构师是想从技术上进行分区还是按领域进行分区?
Architects using technical partitioning organize the components of the system by technical capabilities: presentation, business rules, persistence, and so on. Thus, one of the organizing principles of this architecture is separation of technical concerns. This in turn creates useful levels of decoupling: if the service layer is only connected to the persistence layer below and business rules layer above, then changes in persistence will only potentially affect those layers. This style of partitioning provides a decoupling technique, reducing rippling side effects on dependent components. We cover more details of this architecture style in the layered architecture pattern in Chapter 10. It is certainly logical to organize systems using technical partitioning, but, like all things in software architecture, this offers some trade-offs. 使用技术分区的架构师通过技术能力来组织系统的组件:表示、业务规则、持久性等。因此,这种架构的一个组织原则是技术关注点的分离。这反过来又创造了有用的解耦级别:如果服务层仅与下面的持久性层和上面的业务规则层相连,那么持久性中的变化只会潜在地影响这些层。这种分区风格提供了一种解耦技术,减少了对依赖组件的涟漪副作用。我们在第 10 章的分层架构模式中详细介绍了这种架构风格。使用技术分区来组织系统无疑是合乎逻辑的,但与软件架构中的所有事物一样,这也提供了一些权衡。
The separation enforced by technical partitioning enables developers to find certain categories of the code base quickly, as it is organized by capabilities. However, most realistic software systems require workflows that cut across technical capabilities. Consider the common business workflow of CatalogCheckout. The code to handle CatalogCheckout in the technically layered architecture appears in all the layers, as shown in Figure 8-5. 技术分区强制的分离使开发人员能够快速找到代码库中的某些类别,因为它是按功能组织的。然而,大多数现实的软件系统需要跨越技术能力的工作流程。考虑常见的业务工作流程 CatalogCheckout。在技术分层架构中处理 CatalogCheckout 的代码出现在所有层中,如图 8-5 所示。
Figure 8-5. Where domains/workflows appear in technical- and domain-partitioned architectures 图 8-5. 域/工作流在技术分区和领域分区架构中的出现位置
In Figure 8-5, in the technically partitioned architecture, CatalogCheckout appears in all the layers; the domain is smeared across the technical layers. Contrast this with domain partitioning, which uses a top-level partitioning that organizes components by domain rather than technical capabilities. In Figure 8-5, architects designing the domain-partitioned architecture build top-level components around workflows and/or domains. Each component in the domain partitioning may have subcomponents, including layers, but the top-level partitioning focuses on domains, which better reflects the kinds of changes that most often occur on projects. 在图 8-5 中,在技术分区架构中,CatalogCheckout 出现在所有层中;领域在技术层之间被模糊化。与此对比,领域分区使用的是一种顶层分区,按领域而非技术能力组织组件。在图 8-5 中,设计领域分区架构的架构师围绕工作流和/或领域构建顶层组件。领域分区中的每个组件可能有子组件,包括层,但顶层分区专注于领域,这更好地反映了项目中最常发生的变化类型。
Neither of these styles is more correct than the other-refer to the First Law of Software Architecture. That said, we have observed a decided industry trend over the last few years toward domain partitioning for the monolithic and distributed (for example, microservices) architectures. However, it is one of the first decisions an architect must make. 这两种风格没有哪种比另一种更正确——请参阅软件架构的第一法则。尽管如此,我们观察到在过去几年中,行业趋势明显倾向于对单体和分布式(例如,微服务)架构进行领域划分。然而,这是架构师必须做出的首要决策之一。
Case Study: Silicon Sandwiches: Partitioning 案例研究:硅三明治:分区
Consider the case of one of our example katas, “Case Study: Silicon Sandwiches” on page 69. When deriving components, one of the fundamental decisions facing an architect is the top-level partitioning. Consider the first of two different possibilities for Silicon Sandwiches, a domain partitioning, illustrated in Figure 8-6. 考虑我们示例中的一个案例,“案例研究:硅三明治”,见第 69 页。在推导组件时,架构师面临的基本决策之一是顶层划分。考虑硅三明治的两种不同可能性中的第一种,即域划分,如图 8-6 所示。
Figure 8-6. A domain-partitioned design for Silicon Sandwiches 图 8-6. 硅三明治的领域分区设计
In Figure 8-6, the architect has designed around domains (workflows), creating discrete components for Purchase, Promotion, MakeOrder, ManageInventory, Recipes, Delivery, and Location. Within many of these components resides a subcomponent to handle the various types of customization required, covering both common and local variations. 在图 8-6 中,架构师围绕领域(工作流)进行了设计,为购买、促销、下订单、管理库存、食谱、交付和位置创建了离散组件。在许多这些组件中,存在一个子组件来处理所需的各种类型的定制,涵盖了常见和地方性的变体。
An alternative design isolates the common and local parts into their own partition, illustrated in Figure 8-7. Common and Local represent top-level components, with Pur chase and Delivery remaining to handle the workflow. 一种替代设计将公共部分和本地部分隔离到它们自己的分区中,如图 8-7 所示。Common 和 Local 代表顶级组件,而 Purchase 和 Delivery 则继续处理工作流。
Which is better? It depends! Each partitioning offers different advantages and drawbacks. 哪个更好?这要看情况!每种分区都有不同的优点和缺点。
Figure 8-7. A technically partitioned design for Silicon Sandwiches 图 8-7. 硅三明治的技术分区设计
Domain partitioning 领域分区
Domain-partitioned architectures separate top-level components by workflows and/or domains. 域分区架构通过工作流和/或领域将顶级组件分开。
Advantages 优势
Modeled more closely toward how the business functions rather than an implementation detail 更贴近业务运作方式的建模,而不是实现细节
Easier to utilize the Inverse Conway Maneuver to build cross-functional teams around domains 更容易利用逆康威操作来围绕领域构建跨职能团队
Aligns more closely to the modular monolith and microservices architecture styles 更接近模块化单体和微服务架构风格
Message flow matches the problem domain 消息流与问题域相匹配
Easy to migrate data and components to distributed architecture 易于将数据和组件迁移到分布式架构
Disadvantage 缺点
Customization code appears in multiple places 自定义代码出现在多个地方
Technical partitioning 技术分区
Technically partitioned architectures separate top-level components based on technical capabilities rather than discrete workflows. This may manifest as layers inspired by Model-View-Controller separation or some other ad hoc technical partitioning. Figure 8-7 separates components based on customization. 技术分区架构根据技术能力而非离散工作流来分离顶层组件。这可能表现为受模型-视图-控制器分离启发的层或其他临时技术分区。图 8-7 根据定制化分离组件。
Advantages 优势
Clearly separates customization code. 清楚地分离定制代码。
Aligns more closely to the layered architecture pattern. 更紧密地与分层架构模式对齐。
Disadvantages 缺点
Higher degree of global coupling. Changes to either the Common or Local component will likely affect all the other components. 更高程度的全局耦合。对 Common 或 Local 组件的更改可能会影响所有其他组件。
Developers may have to duplicate domain concepts in both common and local layers. 开发人员可能需要在公共层和本地层中重复领域概念。
Typically higher coupling at the data level. In a system like this, the application and data architects would likely collaborate to create a single database, including customization and domains. That in turn creates difficulties in untangling the data relationships if the architects later want to migrate this architecture to a distributed system. 通常在数据层面上耦合度较高。在这样的系统中,应用程序和数据架构师可能会合作创建一个单一的数据库,包括定制和领域。这反过来又会在架构师后来想将此架构迁移到分布式系统时,造成理清数据关系的困难。
Many other factors contribute to an architect’s decision on what architecture style to base their design upon, covered in Part II. 许多其他因素会影响架构师决定基于何种架构风格进行设计,这在第二部分中进行了讨论。
Developer Role 开发者角色
Developers typically take components, jointly designed with the architect role, and further subdivide them into classes, functions, or subcomponents. In general, class and function design is the shared responsibility of architects, tech leads, and developers, with the lion’s share going to developer roles. 开发人员通常会将与架构师共同设计的组件进一步细分为类、函数或子组件。一般来说,类和函数的设计是架构师、技术负责人和开发人员的共同责任,其中大部分工作由开发人员承担。
Developers should never take components designed by architects as the last word; all software design benefits from iteration. Rather, that initial design should be viewed as a first draft, where implementation will reveal more details and refinements. 开发人员不应将架构师设计的组件视为最终定论;所有软件设计都受益于迭代。相反,初始设计应被视为第一稿,实施将揭示更多细节和改进。
Component Identification Flow 组件识别流程
Component identification works best as an iterative process, producing candidates and refinements through feedback, illustrated in Figure 8-8. 组件识别作为一个迭代过程效果最佳,通过反馈产生候选项和改进,如图 8-8 所示。
Figure 8-8. Component identification cycle 图 8-8. 组件识别周期
This cycle describes a generic architecture exposition cycle. Certain specialized domains may insert other steps in this process or change it altogether. For example, in some domains, some code must undergo security or auditing steps in this process. Descriptions of each step in Figure 8-8 appear in the following sections. 该周期描述了一个通用的架构展示周期。某些专业领域可能会在此过程中插入其他步骤或完全改变它。例如,在某些领域,某些代码必须在此过程中经过安全或审计步骤。图 8-8 中每个步骤的描述将在以下部分中出现。
Identifying Initial Components 识别初始组件
Before any code exists for a software project, the architect must somehow determine what top-level components to begin with, based on what type of top-level partitioning they choose. Outside that, an architect has the freedom to make up whatever components they want, then map domain functionality to them to see where behavior should reside. While this may sound arbitrary, it’s hard to start with anything more concrete if an architect designs a system from scratch. The likelihood of achieving a good design from this initial set of components is disparagingly small, which is why architects must iterate on component design to improve it. 在软件项目的代码存在之前,架构师必须以某种方式确定要开始的顶级组件,这取决于他们选择的顶级分区类型。在此之外,架构师可以自由地创建他们想要的任何组件,然后将领域功能映射到这些组件上,以查看行为应该位于何处。虽然这听起来可能是任意的,但如果架构师从头设计一个系统,开始时很难有更具体的东西。从这组初始组件中实现良好设计的可能性令人沮丧地小,这就是为什么架构师必须对组件设计进行迭代以加以改进。
Assign Requirements to Components 将需求分配给组件
Once an architect has identified initial components, the next step aligns requirements (or user stories) to those components to see how well they fit. This may entail creating new components, consolidating existing ones, or breaking components apart because they have too much responsibility. This mapping doesn’t have to be exactthe architect is attempting to find a good coarse-grained substrate to allow further design and refinement by architects, tech leads, and/or developers. 一旦架构师确定了初始组件,下一步是将需求(或用户故事)与这些组件对齐,以查看它们的适配程度。这可能涉及创建新组件、整合现有组件,或拆分组件,因为它们承担了过多的责任。这种映射不必完全准确,架构师试图找到一个良好的粗粒度基底,以便让架构师、技术负责人和/或开发人员进行进一步的设计和完善。
Analyze Roles and Responsibilities 分析角色和职责
When assigning stories to components, the architect also looks at the roles and responsibilities elucidated during the requirements to make sure that the granularity matches. Thinking about both the roles and behaviors the application must support allows the architect to align the component and domain granularity. One of the greatest challenges for architects entails discovering the correct granularity for components, which encourages the iterative approach described here. 在将故事分配给组件时,架构师还会查看在需求阶段阐明的角色和责任,以确保粒度匹配。考虑应用程序必须支持的角色和行为使架构师能够对齐组件和领域粒度。架构师面临的最大挑战之一是发现组件的正确粒度,这促使了这里描述的迭代方法。
Analyze Architecture Characteristics 分析架构特征
When assigning requirements to components, the architect should also look at the architecture characteristics discovered earlier in order to think about how they might impact component division and granularity. For example, while two parts of a system might deal with user input, the part that deals with hundreds of concurrent users will need different architecture characteristics than another part that needs to support only a few. Thus, while a purely functional view of component design might yield a single component to handle user interaction, analyzing the architecture characteristics will lead to a subdivision. 在将需求分配给组件时,架构师还应考虑之前发现的架构特性,以思考它们可能如何影响组件的划分和粒度。例如,虽然系统的两个部分可能都处理用户输入,但处理数百个并发用户的部分将需要与仅支持少数用户的部分不同的架构特性。因此,尽管纯粹的功能视图可能会产生一个处理用户交互的单一组件,但分析架构特性将导致进一步的细分。
Restructure Components 重构组件
Feedback is critical in software design. Thus, architects must continually iterate on their component design with developers. Designing software provides all kinds of unexpected difficulties-no one can anticipate all the unknown issues that usually occur during software projects. Thus, an iterative approach to component design is key. First, it’s virtually impossible to account for all the different discoveries and edge cases that will arise that encourage redesign. Secondly, as the architecture and developers delve more deeply into building the application, they gain a more nuanced understanding of where behavior and roles should lie. 反馈在软件设计中至关重要。因此,架构师必须与开发人员不断迭代他们的组件设计。设计软件会带来各种意想不到的困难——没有人能预见在软件项目中通常会出现的所有未知问题。因此,组件设计的迭代方法是关键。首先,几乎不可能考虑到所有不同的发现和边缘案例,这些都会促使重新设计。其次,随着架构和开发人员更深入地构建应用程序,他们对行为和角色应该在哪里有了更细致的理解。
Component Granularity 组件粒度
Finding the proper granularity for components is one of an architect’s most difficult tasks. Too fine-grained a component design leads to too much communication between components to achieve results. Too coarse-grained components encourage high internal coupling, which leads to difficulties in deployability and testability, as well as modularity-related negative side effects. 找到组件的适当粒度是架构师最困难的任务之一。粒度过细的组件设计会导致组件之间的通信过多,从而难以实现结果。粒度过粗的组件则会导致内部耦合度过高,这会导致可部署性和可测试性方面的困难,以及与模块化相关的负面影响。
Component Design 组件设计
No accepted “correct” way exists to design components. Rather, a wide variety of techniques exist, all with various trade-offs. In all processes, an architect takes requirements and tries to determine what coarse-grained building blocks will make up the application. Lots of different techniques exist, all with varying trade-offs and coupled to the software development process used by the team and organization. Here, we talk about a few general ways to discover components and traps to avoid. 没有公认的“正确”方式来设计组件。相反,存在多种技术,所有这些技术都有不同的权衡。在所有过程中,架构师会获取需求并尝试确定哪些粗粒度构建块将构成应用程序。存在许多不同的技术,所有这些技术都有不同的权衡,并与团队和组织使用的软件开发过程相结合。在这里,我们讨论一些发现组件的一般方法和需要避免的陷阱。
Discovering Components 发现组件
Architects, often in collaboration with other roles such as developers, business analysts, and subject matter experts, create an initial component design based on general knowledge of the system and how they choose to decompose it, based on technical or domain partitioning. The team goal is an initial design that partitions the problem space into coarse chunks that take into account differing architecture characteristics. 架构师通常与开发人员、业务分析师和主题专家等其他角色合作,根据对系统的一般知识以及他们选择的技术或领域划分,创建初步的组件设计。团队的目标是一个初步设计,将问题空间划分为粗略的块,以考虑不同的架构特征。
Entity trap 实体陷阱
While there is no one true way to ascertain components, a common anti-pattern lurks: the entity trap. Say that an architect is working on designing components for our kata Going, Going, Gone and ends up with a design resembling Figure 8-9. 虽然没有一种绝对正确的方法来确定组件,但一个常见的反模式潜伏着:实体陷阱。假设一个架构师正在为我们的 kata Going, Going, Gone 设计组件,最终得到了一个类似于图 8-9 的设计。
Figure 8-9. Building an architecture as an object-relational mapping 图 8-9. 将架构构建为对象关系映射
In Figure 8-9, the architect has basically taken each entity identified in the requirements and made a Manager component based on that entity. This isn’t an architecture; it’s an object-relational mapping (ORM) of a framework to a database. In other words, if a system only needs simple database CRUD operations (create, read, update, delete), then the architect can download a framework to create user interfaces directly from the database. Many popular ORM frameworks exist to solve this common CRUD behavior. 在图 8-9 中,架构师基本上根据需求中识别的每个实体创建了一个管理器组件。这并不是架构;它是一个框架到数据库的对象关系映射(ORM)。换句话说,如果一个系统只需要简单的数据库 CRUD 操作(创建、读取、更新、删除),那么架构师可以下载一个框架直接从数据库创建用户界面。许多流行的 ORM 框架存在以解决这种常见的 CRUD 行为。
Naked Objects and Similar Frameworks 裸对象及类似框架
More than a decade ago, a family of frameworks appeared that makes building simple CRUD applications trivial, exemplified by Naked Objects (which has since split into two projects, a .NET version still called NakedObjects, and a Java version that moved to the Apache open source foundation under the name Isis). The premise behind these frameworks offers to build a user interface frontend on database entities. For example, in Naked Objects, the developer points the framework to database tables, and the framework builds a user interface based on the tables and their defined relationships. 十多年前,出现了一系列框架,使得构建简单的 CRUD 应用程序变得微不足道,以 Naked Objects 为例(该框架后来分裂为两个项目,一个是仍称为 NakedObjects 的 .NET 版本,另一个是转移到 Apache 开源基金会的 Java 版本,名为 Isis)。这些框架的前提是基于数据库实体构建用户界面前端。例如,在 Naked Objects 中,开发者将框架指向数据库表,框架根据表及其定义的关系构建用户界面。
Several other popular frameworks exist that basically provide a default user interface based on database table structure: the scaffolding feature of the Ruby on Rails framework provides the same kind of default mappings from website to database (with many options to extend and add sophistication to the resulting application). 还有几个其他流行的框架,它们基本上提供了基于数据库表结构的默认用户界面:Ruby on Rails 框架的脚手架功能提供了从网站到数据库的相同类型的默认映射(有许多选项可以扩展并增加生成应用程序的复杂性)。
If an architect’s needs require merely a simple mapping from a database to a user interface, full-blown architecture isn’t necessary; one of these frameworks will suffice. 如果架构师的需求仅仅是将数据库映射到用户界面,那么不需要完整的架构;其中一个框架就足够了。
The entity trap anti-pattern arises when an architect incorrectly identifies the database relationships as workflows in the application, a correspondence that rarely manifests in the real world. Rather, this anti-pattern generally indicates lack of thought about the actual workflows of the application. Components created with the entity trap also tend to be too coarse-grained, offering no guidance whatsoever to the development team in terms of the packaging and overall structuring of the source code. 实体陷阱反模式出现在架构师错误地将数据库关系识别为应用程序中的工作流时,这种对应关系在现实世界中很少出现。相反,这种反模式通常表明对应用程序实际工作流缺乏思考。使用实体陷阱创建的组件往往过于粗粒度,根本无法为开发团队在源代码的打包和整体结构方面提供任何指导。
Actor/Actions approach Actor/Actions 方法
The actor/actions approach is a popular way that architects use to map requirements to components. In this approach, originally defined by the Rational Unified Process, architects identify actors who perform activities with the application and the actions those actors may perform. It provides a technique for discovering the typical users of the system and what kinds of things they might do with the system. 演员/动作方法是架构师用来将需求映射到组件的一种流行方式。在这种方法中,最初由 Rational Unified Process 定义,架构师识别出在应用程序中执行活动的演员以及这些演员可能执行的动作。它提供了一种发现系统典型用户及其可能与系统进行的操作的技术。
The actor/actions approach became popular in conjunction with particular software development processes, especially more formal processes that favor a significant por- 演员/动作方法在特定的软件开发过程中变得流行,特别是那些更正式的过程,这些过程更倾向于显著的部分
tion of upfront design. It is still popular and works well when the requirements feature distinct roles and the kinds of actions they can perform. This style of component decomposition works well for all types of systems, monolithic or distributed. 前期设计的概念。它仍然很受欢迎,并且在需求具有明确角色和它们可以执行的操作类型时效果很好。这种组件分解的风格适用于所有类型的系统,无论是单体还是分布式。
Event storming 事件风暴
Event storming as a component discovery technique comes from domain-driven design (DDD) and shares popularity with microservices, also heavily influenced by DDD. In event storming, the architect assumes the project will use messages and/or events to communicate between the various components. To that end, the team tries to determine which events occur in the system based on requirements and identified roles, and build components around those event and message handlers. This works well in distributed architectures like microservices that use events and messages, because it helps architects define the messages used in the eventual system. 事件风暴作为一种组件发现技术源自领域驱动设计(DDD),并与微服务共享流行度,也受到 DDD 的强烈影响。在事件风暴中,架构师假设项目将使用消息和/或事件在各个组件之间进行通信。为此,团队尝试根据需求和识别的角色确定系统中发生的事件,并围绕这些事件和消息处理程序构建组件。这在使用事件和消息的分布式架构(如微服务)中效果很好,因为它帮助架构师定义最终系统中使用的消息。
Workflow approach 工作流方法
An alternative to event storming offers a more generic approach for architects not using DDD or messaging. The workflow approach models the components around workflows, much like event storming, but without the explicit constraints of building a message-based system. A workflow approach identifies the key roles, determines the kinds of workflows these roles engage in, and builds components around the identified activities. 事件风暴的替代方案为不使用 DDD 或消息的架构师提供了一种更通用的方法。工作流方法围绕工作流对组件进行建模,类似于事件风暴,但没有构建基于消息的系统的明确限制。工作流方法识别关键角色,确定这些角色参与的工作流类型,并围绕识别的活动构建组件。
None of these techniques is superior to the others; all offer a different set of tradeoffs. If a team uses a waterfall approach or other older software development processes, they might prefer the Actor/Actions approach because it is general. When using DDD and corresponding architectures like microservices, event storming matches the software development process exactly. 这些技术没有哪一种优于其他;它们都提供了一组不同的权衡。如果一个团队使用瀑布模型或其他较旧的软件开发流程,他们可能会更喜欢 Actor/Actions 方法,因为它是通用的。当使用 DDD 和相应的架构如微服务时,事件风暴与软件开发过程完全匹配。
Case Study: Going, Going, Gone: Discovering Components 案例研究:去,去,去:发现组件
If a team has no special constraints and is looking for a good general-purpose component decomposition, the Actor/Actions approach works well as a generic solution. It’s the one we use in our case study for Going, Going, Gone. 如果一个团队没有特殊的约束,并且正在寻找一个好的通用组件分解,Actor/Actions 方法作为通用解决方案效果很好。这是我们在案例研究《Going, Going, Gone》中使用的方法。
In Chapter 7, we introduced the architecture kata for Going, Going, Gone (GGG) and discovered architecture characteristics for this system. This system has three obvious roles: the bidder, the auctioneer, and a frequent participant in this modeling technique, the system, for internal actions. The roles interact with the application, represented here by the system, which identifies when the application initiates an event rather than one of the roles. For example, in GGG, once the auction is complete, the system triggers the payment system to process payments. 在第七章中,我们介绍了“进行、进行、消失”(GGG)的架构练习,并发现了该系统的架构特征。该系统有三个明显的角色:竞标者、拍卖师,以及在这种建模技术中频繁参与的角色——系统,用于内部操作。这些角色与应用程序进行交互,这里由系统表示,系统识别何时是应用程序发起事件,而不是其中一个角色。例如,在 GGG 中,一旦拍卖完成,系统会触发支付系统来处理付款。
We can also identify a starting set of actions for each of these roles: 我们还可以为每个角色确定一组初始行动
Bidder 投标人
View live video stream, view live bid stream, place a bid 查看实时视频流,查看实时竞标流,提交竞标
Auctioneer 拍卖师
Enter live bids into system, receive online bids, mark item as sold 将实时竞标输入系统,接收在线竞标,将物品标记为已售出
System 系统
Start auction, make payment, track bidder activity 开始拍卖,进行支付,跟踪竞标者活动
Given these actions, we can iteratively build a set of starter components for GGG; one such solution appears in Figure 8-10. 鉴于这些操作,我们可以迭代地为 GGG 构建一组初始组件;一个这样的解决方案出现在图 8-10 中。
Figure 8-10. Initial set of components for Going, Going, Gone 图 8-10. Going, Going, Gone 的初始组件集
In Figure 8-10, each of the roles and actions maps to a component, which in turn may need to collaborate on information. These are the components we identified for this solution: 在图 8-10 中,每个角色和行动都映射到一个组件,而这个组件可能需要在信息上进行协作。这些是我们为这个解决方案识别的组件:
VideoStreamer 视频流媒体播放器
Streams a live auction to users. 将实时拍卖流式传输给用户。
BidStreamer
Streams bids as they occur to the users. Both VideoStreamer and BidStreamer offer read-only views of the auction to the bidder. 实时向用户传送出价。VideoStreamer 和 BidStreamer 都为竞标者提供只读的拍卖视图。
BidCapture
This component captures bids from both the auctioneer and bidders. 该组件捕获来自拍卖师和竞标者的出价。
BidTracker
Tracks bids and acts as the system of record. 跟踪投标并充当记录系统。
AuctionSession 拍卖会话
Starts and stops an auction. When the bidder ends the auction, performs the payment and resolution steps, including notifying bidders of ending. 开始和结束拍卖。当竞标者结束拍卖时,执行支付和解决步骤,包括通知竞标者拍卖结束。
Payment 付款
Third-party payment processor for credit card payments. 第三方支付处理器用于信用卡支付。
Referring to the component identification flow diagram in Figure 8-8, after the initial identification of components, the architect next analyzes architecture characteristics to determine if that will change the design. For this system, the architect can definitely identify different sets of architecture characteristics. For example, the current design features a BidCapture component to capture bids from both bidders and the auctioneer, which makes sense functionally: capturing bids from anyone can be handled the same. However, what about architecture characteristics around bid capture? The auctioneer doesn’t need the same level of scalability or elasticity as potentially thousands of bidders. By the same token, an architect must ensure that architecture characteristics like reliability (connections don’t drop) and availability (the system is up) for the auctioneer could be higher than other parts of the system. For example, while it’s bad for business if a bidder can’t log in to the site or if they suffer from a dropped connection, it’s disastrous to the auction if either of those things happen to the auctioneer. 参考图 8-8 中的组件识别流程图,在初步识别组件后,架构师接下来分析架构特性,以确定这是否会改变设计。对于该系统,架构师可以明确识别出不同的架构特性。例如,当前设计具有一个 BidCapture 组件,用于从投标人和拍卖师那里捕获投标,这在功能上是合理的:从任何人那里捕获投标都可以以相同的方式处理。然而,关于投标捕获的架构特性呢?拍卖师不需要与可能有成千上万的投标人相同级别的可扩展性或弹性。同样,架构师必须确保拍卖师的架构特性,如可靠性(连接不掉线)和可用性(系统正常运行),可能高于系统的其他部分。例如,如果投标人无法登录网站或遭遇掉线,对业务来说是坏事,但如果拍卖师发生这两种情况中的任何一种,对拍卖来说就是灾难。
Because they have differing levels of architecture characteristics, the architect decides to split the Bid Capture component into Bid Capture and Auctioneer Capture so that each of the two components can support differing architecture characteristics. The updated design appears in Figure 8-11. 由于它们具有不同级别的架构特性,架构师决定将投标捕获组件拆分为投标捕获和拍卖师捕获,以便这两个组件可以支持不同的架构特性。更新后的设计如图 8-11 所示。
The architect creates a new component for Auctioneer Capture and updates information links to both Bid Streamer (so that online bidders see the live bids) and Bid Tracker, which is managing the bid streams. Note that Bid Tracker is now the component that will unify the two very different information streams: the single stream of information from the auctioneer and the multiple streams from bidders. 架构师为拍卖师捕获创建了一个新组件,并更新了与竞标流(以便在线竞标者可以看到实时竞标)和管理竞标流的竞标跟踪器的信息链接。请注意,竞标跟踪器现在是将来自拍卖师的单一信息流和来自竞标者的多个信息流统一起来的组件。
Figure 8-11. Incorporating architecture characteristics into GGG component design 图 8-11. 将架构特性纳入 GGG 组件设计
The design shown in Figure 8-11 isn’t likely the final design. More requirements must be uncovered (how do people register, administration functions around payment, and so on). However, this example provides a good starting point to start iterating further on the design. 图 8-11 中显示的设计不太可能是最终设计。还需要发现更多需求(人们如何注册、与支付相关的管理功能等等)。然而,这个例子为进一步迭代设计提供了一个良好的起点。
This is one possible set of components to solve the GGG problem-but it’s not necessarily correct, nor is it the only one. Few software systems have only one way that developers can implement them; every design has different sets of trade-offs. As an architect, don’t obsess over finding the one true design, because many will suffice (and less likely overengineered). Rather, try to objectively assess the trade-offs between different design decisions, and choose the one that has the least worst set of trade-offs. 这是解决 GGG 问题的一种可能组件组合——但这并不一定是正确的,也不是唯一的。很少有软件系统只有一种开发者可以实现它们的方式;每个设计都有不同的权衡。作为架构师,不要过于执着于寻找唯一的设计,因为许多设计都足够(而且不太可能过度工程化)。相反,尽量客观地评估不同设计决策之间的权衡,选择权衡最小的那个。
Architecture Quantum Redux: Choosing Between Monolithic Versus Distributed Architectures 架构量子重构:选择单体架构与分布式架构之间的区别
Recalling the discussion defining architecture quantum in “Architectural Quanta and Granularity” on page 92, the architecture quantum defines the scope of architecture characteristics. That in turn leads an architect toward an important decision as they finish their initial component design: should the architecture be monolithic or distributed? 回忆在第 92 页“建筑量子与粒度”中定义建筑量子的讨论,建筑量子定义了建筑特征的范围。这反过来又引导架构师在完成初步组件设计时做出一个重要决定:架构应该是单体的还是分布式的?
A monolithic architecture typically features a single deployable unit, including all functionality of the system that runs in the process, typically connected to a single database. Types of monolithic architectures include the layered and modular monolith, discussed fully in Chapter 10. A distributed architecture is the opposite-the application consists of multiple services running in their own ecosystem, communicating via networking protocols. Distributed architectures may feature finer-grained deployment models, where each service may have its own release cadence and engineering practices, based on the development team and their priorities. 单体架构通常具有一个可部署的单元,包括在进程中运行的系统的所有功能,通常连接到一个单一的数据库。单体架构的类型包括分层单体和模块化单体,详见第 10 章。分布式架构则相反——应用程序由多个服务组成,这些服务在各自的生态系统中运行,通过网络协议进行通信。分布式架构可能具有更细粒度的部署模型,每个服务可能根据开发团队及其优先级拥有自己的发布节奏和工程实践。
Each architecture style offers a variety of trade-offs, covered in Part II. However, the fundamental decision rests on how many quanta the architecture discovers during the design process. If the system can manage with a single quantum (in other words, one set of architecture characteristics), then a monolith architecture offers many advantages. On the other hand, differing architecture characteristics for components, as illustrated in the GGG component analysis, requires a distributed architecture to accommodate differing architecture characteristics. For example, the VideoStreamer and BidStreamer both offer read-only views of the auction to bidders. From a design standpoint, an architect would rather not deal with read-only streaming mixed with high-scale updates. Along with the aforementioned differences between bidder and auctioneer, these differing characteristics lead an architect to choose a distributed architecture. 每种架构风格都提供了各种权衡,详见第二部分。然而,根本的决定在于架构在设计过程中发现了多少个量子。如果系统可以仅用一个量子(换句话说,一组架构特征)来管理,那么单体架构提供了许多优势。另一方面,如 GGG 组件分析所示,组件的不同架构特征需要分布式架构来适应不同的架构特征。例如,VideoStreamer 和 BidStreamer 都为竞标者提供了对拍卖的只读视图。从设计的角度来看,架构师更不愿意处理只读流与高频更新混合的情况。加上竞标者和拍卖者之间上述的差异,这些不同的特征使得架构师选择分布式架构。
The ability to determine a fundamental design characteristic of architecture (monolith versus distributed) early in the design process highlights one of the advantages of using the architecture quantum as a way of analyzing architecture characteristics scope and coupling. 在设计过程中尽早确定架构的基本设计特征(单体与分布式)突显了将架构量子作为分析架构特征范围和耦合的一种方法的优势。
Architecture Styles 架构风格
The difference between an architecture style and an architecture pattern can be confusing. We define an architecture style as the overarching structure of how the user interface and backend source code are organized (such as within layers of a monolithic deployment or separately deployed services) and how that source code interacts with a datastore. Architecture patterns, on the other hand, are lower-level design structures that help form specific solutions within an architecture style (such as how to achieve high scalability or high performance within a set of operations or between sets of services). 架构风格和架构模式之间的区别可能会令人困惑。我们将架构风格定义为用户界面和后端源代码组织的总体结构(例如,在单体部署的层中或单独部署的服务中)以及该源代码如何与数据存储交互。另一方面,架构模式是较低级别的设计结构,帮助在架构风格中形成特定解决方案(例如,如何在一组操作或服务集之间实现高可扩展性或高性能)。
Understanding architecture styles occupies much of the time and effort for new architects because they share importance and abundance. Architects must understand the various styles and the trade-offs encapsulated within each to make effective decisions; each architecture style embodies a well-known set of trade-offs that help an architect make the right choice for a particular business problem. 理解架构风格占据了新架构师大量的时间和精力,因为它们的重要性和丰富性。架构师必须理解各种风格及其所包含的权衡,以便做出有效的决策;每种架构风格都体现了一组众所周知的权衡,帮助架构师为特定的业务问题做出正确的选择。
CHAPTER 9 第 9 章
Foundations 基础
Architecture styles, sometimes called architecture patterns, describe a named relationship of components covering a variety of architecture characteristics. An architecture style name, similar to design patterns, creates a single name that acts as shorthand between experienced architects. For example, when an architect talks about a layered monolith, their target in the conversation understands aspects of structure, which kinds of architecture characteristics work well (and which ones can cause problems), typical deployment models, data strategies, and a host of other information. Thus, architects should be familiar with the basic names of fundamental generic architecture styles. 架构风格,有时称为架构模式,描述了一种命名的组件关系,涵盖各种架构特征。架构风格名称,类似于设计模式,为经验丰富的架构师之间创建了一个作为简写的单一名称。例如,当架构师谈论分层单体时,谈话中的目标理解结构的各个方面,哪些架构特征运作良好(以及哪些可能导致问题),典型的部署模型,数据策略,以及其他大量信息。因此,架构师应该熟悉基本的通用架构风格的名称。
Each name captures a wealth of understood detail, one of the purposes of design patterns. An architecture style describes the topology, assumed and default architecture characteristics, both beneficial and detrimental. We cover many common modern architecture patterns in the remainder of this section of the book (Part II). However, architects should be familiar with several fundamental patterns that appear embedded within the larger patterns. 每个名称都捕捉了丰富的已知细节,这是设计模式的目的之一。架构风格描述了拓扑、假定和默认的架构特征,包括有益和有害的方面。在本书的这一部分(第二部分)中,我们将讨论许多常见的现代架构模式。然而,架构师应该熟悉几种嵌入在更大模式中的基本模式。
Fundamental Patterns 基本模式
Several fundamental patterns appear again and again throughout the history of software architecture because they provide a useful perspective on organizing code, deployments, or other aspects of architecture. For example, the concept of layers in architecture, separating different concerns based on functionality, is as old as software itself. Yet, the layered pattern continues to manifest in different guises, including modern variants discussed in Chapter 10. 在软件架构的历史中,几个基本模式反复出现,因为它们提供了组织代码、部署或架构其他方面的有用视角。例如,架构中的层次概念,根据功能分离不同的关注点,和软件本身一样古老。然而,分层模式仍然以不同的形式出现,包括第 10 章讨论的现代变体。
Big Ball of Mud 大泥球
Architects refer to the absence of any discernible architecture structure as a Big Ball of MudM u d, named after the eponymous anti-pattern defined in a paper released in 1997 by Brian Foote and Joseph Yoder: 架构师将任何可辨别的架构结构的缺失称为“大球”,这个名称源于 1997 年 Brian Foote 和 Joseph Yoder 发布的一篇论文中定义的同名反模式:
A Big Ball of Mud is a haphazardly structured, sprawling, sloppy, duct-tape-and-balingwire, spaghetti-code jungle. These systems show unmistakable signs of unregulated growth, and repeated, expedient repair. Information is shared promiscuously among distant elements of the system, often to the point where nearly all the important information becomes global or duplicated. 一团大泥巴是一个结构杂乱、蔓延、邋遢、用胶带和捆绑线修补的意大利面代码丛林。这些系统显示出明显的无序增长和反复的权宜之计修复的迹象。信息在系统的远程元素之间随意共享,常常导致几乎所有重要信息都变得全球化或重复。
The overall structure of the system may never have been well defined. 系统的整体结构可能从未被明确定义。
If it was, it may have eroded beyond recognition. Programmers with a shred of architectural sensibility shun these quagmires. Only those who are unconcerned about architecture, and, perhaps, are comfortable with the inertia of the day-to-day chore of patching the holes in these failing dikes, are content to work on such systems. 如果是这样,它可能已经侵蚀到无法辨认的地步。具有一点架构敏感性的程序员会避开这些泥潭。只有那些对架构毫不在意的人,也许是对日常修补这些失败堤坝的惯性感到满意的人,才愿意在这样的系统上工作。
-Brian Foote and Joseph Yoder -布赖恩·福特和约瑟夫·约德
In modern terms, a big ball of mud might describe a simple scripting application with event handlers wired directly to database calls, with no real internal structure. Many trivial applications start like this then become unwieldy as they continue to grow. 在现代术语中,一个“大泥球”可能描述的是一个简单的脚本应用程序,其事件处理程序直接连接到数据库调用,没有真正的内部结构。许多微不足道的应用程序都是这样开始的,然后随着它们的不断增长变得难以管理。
In general, architects want to avoid this type of architecture at all costs. The lack of structure makes change increasingly difficult. This type of architecture also suffers from problems in deployment, testability, scalability, and performance. 一般来说,架构师希望不惜一切代价避免这种类型的架构。缺乏结构使得变更变得越来越困难。这种类型的架构在部署、可测试性、可扩展性和性能方面也存在问题。
Unfortunately, this architecture anti-pattern occurs quite commonly in the real world. Few architects intend to create one, but many projects inadvertently manage to create a mess because of lack of governance around code quality and structure. For example, Neal worked with a client project whose structure appears in Figure 9-1. 不幸的是,这种架构反模式在现实世界中相当常见。很少有架构师打算创建一个,但许多项目由于缺乏对代码质量和结构的治理,不经意间造成了混乱。例如,Neal 曾与一个客户项目合作,其结构如图 9-1 所示。
The client (whose name is withheld for obvious reasons) created a Java-based web application as quickly as possible over several years. The technical visualization ^(1){ }^{1} shows their architectural coupling: each dot on the perimeter of the circle represents a class, and each line represents connections between the classes, where bolder lines indicate stronger connections. In this code base, any change to a class makes it difficult to predict rippling side effects to other classes, making change a terrifying affair. 客户(出于明显原因其姓名被隐去)在数年内尽可能快速地创建了一个基于 Java 的 Web 应用程序。技术可视化 ^(1){ }^{1} 显示了他们的架构耦合:圆周上的每个点代表一个类,每条线代表类之间的连接,线条越粗表示连接越强。在这个代码库中,对一个类的任何更改都使得预测对其他类的连锁反应变得困难,使得更改成为一件令人恐惧的事情。
Figure 9-1. A Big Ball of Mud architecture visualized from a real code base 图 9-1. 从真实代码库可视化的大球泥 architecture
Unitary Architecture 单一架构
When software originated, there was only the computer, and software ran on it. Through the various eras of hardware and software evolution, the two started as a single entity, then split as the need for more sophisticated capabilities grew. For example, mainframe computers started as singular systems, then gradually separated data into its own kind of system. Similarly, when personal computers first appeared, much of the commercial development focused on single machines. As networking PCs became common, distributed systems (such as client/server) appeared. 当软件起源时,只有计算机,软件在其上运行。随着硬件和软件演变的各个时代,这两者最初作为一个整体,然后随着对更复杂功能需求的增长而分开。例如,大型机最初作为单一系统出现,然后逐渐将数据分离到其自身的系统中。类似地,当个人计算机首次出现时,许多商业开发集中在单台机器上。随着网络计算机的普及,分布式系统(如客户端/服务器)出现了。
Few unitary architectures exist outside embedded systems and other highly constrained environments. Generally, software systems tend to grow in functionality over time, requiring separation of concerns to maintain operational architecture characteristics, such as performance and scale. 在嵌入式系统和其他高度受限的环境之外,几乎没有单一架构存在。一般来说,软件系统往往随着时间的推移而功能增长,需要关注点分离以维持操作架构特性,如性能和规模。
Client/Server 客户端/服务器
Over time, various forces required partitioning away from a single system; how to do that forms the basis for many of these styles. Many architecture styles deal with how to efficiently separate parts of the system. 随着时间的推移,各种力量要求从单一系统中进行分区;如何做到这一点构成了许多这些风格的基础。许多架构风格处理如何有效地分离系统的各个部分。
A fundamental style in architecture separates technical functionality between frontend and backend, called a two-tier, or client/server, architecture. Many different flavors of this architecture exist, depending on the era and computing capabilities. 一种基本的架构风格将技术功能分为前端和后端,称为两层架构或客户端/服务器架构。根据时代和计算能力,这种架构存在许多不同的变种。
Desktop + database server 桌面 + 数据库服务器
An early personal computer architecture encouraged developers to write rich desktop applications in user interfaces like Windows, separating the data into a separate database server. This architecture coincided with the appearance of standalone database servers that could connect via standard network protocols. It allowed presentation logic to reside on the desktop, while the more computationally intense action (both in volume and complexity) occurred on more robust database servers. 早期的个人计算机架构鼓励开发者在像 Windows 这样的用户界面中编写丰富的桌面应用程序,将数据分离到一个独立的数据库服务器中。这种架构与独立数据库服务器的出现相吻合,这些服务器可以通过标准网络协议连接。它允许展示逻辑驻留在桌面上,而更计算密集的操作(无论是数量还是复杂性)则发生在更强大的数据库服务器上。
Browser + web server 浏览器 + web 服务器
Once modern web development arrived, the common split became web browser connected to web server (which in turn was connected to a database server). The separation of responsibilities was similar to the desktop variant but with even thinner clients as browsers, allowing a wider distribution both inside and outside firewalls. Even though the database is separate from the web server, architects often still consider this a two-tier architecture because the web and database servers run on one class of machine within the operations center and the user interface runs on the user’s browser. 一旦现代网页开发到来,常见的分离变成了连接到网络服务器的网页浏览器(而网络服务器又连接到数据库服务器)。职责的分离与桌面变体相似,但客户端更薄,因为浏览器允许在防火墙内外进行更广泛的分发。尽管数据库与网络服务器是分开的,架构师仍然通常将其视为两层架构,因为网络和数据库服务器在操作中心的同一类机器上运行,而用户界面则在用户的浏览器上运行。
Three-tier 三层
An architecture that became quite popular during the late 1990s was a three-tier architecture, which provided even more layers of separation. As tools like application servers became popular in Java and .NET, companies started building even more layers in their topology: a database tier using an industrial-strength database server, an application tier managed by an application server, frontend coded in generated HTML, and increasingly, JavaScript, as its capabilities expanded. 在 1990 年代末期,三层架构变得相当流行,它提供了更多的分离层次。随着应用服务器在 Java 和.NET 中变得流行,公司开始在其拓扑中构建更多的层次:使用工业级数据库服务器的数据库层,由应用服务器管理的应用层,前端使用生成的 HTML 编码,越来越多地使用 JavaScript,因为其功能不断扩展。
The three-tier architecture corresponded with network-level protocols such as Common Object Request Broker Architecture (CORBA) and Distributed Component Object Model (DCOM) that facilitated building distributed architectures. 三层架构与网络级协议相对应,例如通用对象请求代理架构(CORBA)和分布式组件对象模型(DCOM),这些协议促进了分布式架构的构建。
Just as developers today don’t worry about how network protocols like TCP/IP work (they just work), most architects don’t have to worry about this level of plumbing in distributed architectures. The capabilities offered by such tools in that era exist today as either tools (like message queues) or architecture patterns (such as event-driven architecture, covered in Chapter 14). 正如今天的开发人员不必担心网络协议如 TCP/IP 是如何工作的(它们只是工作),大多数架构师也不必担心分布式架构中的这种管道级别。那个时代提供的这些工具的能力今天以工具(如消息队列)或架构模式(如第 14 章中讨论的事件驱动架构)的形式存在。
Three-Tier, Language Design, and Long-Term Implications 三层架构、语言设计和长期影响
During the era in which the Java language was designed, three-tier computing was all the rage. Thus, it was assumed that, in the future, all systems would be three-tier architectures. One of the common headaches with existing languages such as C++\mathrm{C}++ was how cumbersome it was to move objects over the network in a consistent way between systems. Thus, the designers of Java decided to build this capability into the core of the language using a mechanism called serialization. Every Object in Java implements an interface that requires it to support serialization. The designers figured that since three-tiered architecture would forever be the architecture style, baking it into the language would offer a great convenience. Of course, that architectural style came and went, yet the leftovers appear in Java to this day, greatly frustrating the language designer who wants to add modern features that, for backward compatibility, must support serialization, which virtually no one uses today. 在 Java 语言设计的时代,三层计算非常流行。因此,人们认为未来所有系统都将是三层架构。现有语言如 C++\mathrm{C}++ 的一个常见问题是,在系统之间以一致的方式通过网络移动对象是多么繁琐。因此,Java 的设计者决定将这一能力内置于语言的核心,使用一种称为序列化的机制。Java 中的每个对象都实现了一个接口,要求其支持序列化。设计者认为,由于三层架构将永远是架构风格,将其嵌入语言中将提供极大的便利。当然,这种架构风格来来去去,但遗留的部分至今仍出现在 Java 中,这极大地让希望添加现代特性的语言设计者感到沮丧,因为出于向后兼容的考虑,必须支持序列化,而实际上几乎没有人今天使用它。
Understanding the long-term implications of design decisions has always eluded us, in software, as in other engineering disciplines. The perpetual advice to favor simple designs is in many ways defense against future consequences. 理解设计决策的长期影响一直让我们感到困惑,无论是在软件领域还是其他工程学科。始终建议优先考虑简单设计在许多方面都是对未来后果的防御。
Monolithic Versus Distributed Architectures 单体架构与分布式架构
Architecture styles can be classified into two main types: monolithic (single deployment unit of all code) and distributed (multiple deployment units connected through remote access protocols). While no classification scheme is perfect, distributed architectures all share a common set of challenges and issues not found in the monolithic architecture styles, making this classification scheme a good separation between the various architecture styles. In this book we will describe in detail the following architecture styles: 架构风格可以分为两种主要类型:单体(所有代码的单一部署单元)和分布式(通过远程访问协议连接的多个部署单元)。虽然没有分类方案是完美的,但分布式架构都共享一组在单体架构风格中不存在的共同挑战和问题,这使得这个分类方案在各种架构风格之间形成了良好的区分。在本书中,我们将详细描述以下架构风格:
Distributed architecture styles, while being much more powerful in terms of performance, scalability, and availability than monolithic architecture styles, have significant trade-offs for this power. The first group of issues facing all distributed architectures are described in the fallacies of distributed computing, first coined by L. Peter Deutsch and other colleagues from Sun Microsystems in 1994. A fallacy is something that is believed or assumed to be true but is not. All eight of the fallacies of distributed computing apply to distributed architectures today. The following sections describe each fallacy. 分布式架构风格在性能、可扩展性和可用性方面比单体架构风格强大得多,但这种强大也带来了显著的权衡。所有分布式架构面临的第一组问题被称为分布式计算的谬论,这一术语最早由 L. Peter Deutsch 和其他来自 Sun Microsystems 的同事在 1994 年提出。谬论是指被认为或假设为真的事物,但实际上并非如此。今天,分布式计算的八个谬论都适用于分布式架构。以下部分将描述每个谬论。
Fallacy #1: The Network Is Reliable 谬论 #1:网络是可靠的
Figure 9-2. The network is not reliable 图 9-2. 网络不可靠
Developers and architects alike assume that the network is reliable, but it is not. While networks have become more reliable over time, the fact of the matter is that networks still remain generally unreliable. This is significant for all distributed architectures because all distributed architecture styles rely on the network for communication to and from services, as well as between services. As illustrated in Figure 9-2, Service B may be totally healthy, but Service A cannot reach it due to a network problem; or even worse, Service A made a request to Service B to process some data and does not receive a response because of a network issue. This is why things like timeouts and circuit breakers exist between services. The more a system relies on the network (such as microservices architecture), the potentially less reliable it becomes. 开发人员和架构师都假设网络是可靠的,但事实并非如此。虽然网络随着时间的推移变得更加可靠,但实际上网络仍然普遍不可靠。这对所有分布式架构来说都是重要的,因为所有分布式架构风格都依赖于网络进行服务之间以及服务与服务之间的通信。如图 9-2 所示,服务 B 可能完全正常,但服务 A 由于网络问题无法到达它;甚至更糟的是,服务 A 向服务 B 发出了处理某些数据的请求,但由于网络问题没有收到响应。这就是为什么在服务之间存在超时和断路器等机制的原因。一个系统越依赖网络(例如微服务架构),它的可靠性就可能越低。
Fallacy #2: Latency Is Zero 谬论 #2:延迟为零
Figure 9-3. Latency is not zero 图 9-3. 延迟不是零
As Figure 9-3 shows, when a local call is made to another component via a method or function call, that time ( tt _local) is measured in nanoseconds or microseconds. However, when that same call is made through a remote access protocol (such as REST, messaging, or RPC), the time measured to access that service ( tt _remote) is 如图 9-3 所示,当通过方法或函数调用对另一个组件进行本地调用时,该时间( tt _local)以纳秒或微秒为单位进行测量。然而,当通过远程访问协议(如 REST、消息传递或 RPC)进行相同的调用时,访问该服务所测量的时间( tt _remote)是 Latency in any distributed architecture is not zero, yet most architects ignore this fallacy, insisting that they have fast networks. Ask yourself this question: do you know what the average round-trip latency is for a RESTful call in your production environment? Is it 60 milliseconds? Is it 500 milliseconds? 在任何分布式架构中,延迟都不是零,但大多数架构师忽视了这个谬论,坚称他们拥有快速的网络。问问自己这个问题:你知道在你的生产环境中,RESTful 调用的平均往返延迟是多少吗?是 60 毫秒吗?是 500 毫秒吗?
When using any distributed architecture, architects must know this latency average. It is the only way of determining whether a distributed architecture is feasible, particularly when considering microservices (see Chapter 17) due to the fine-grained nature of the services and the amount of communication between those services. Assuming an average of 100 milliseconds of latency per request, chaining together 10 service calls to perform a particular business function adds 1,000 milliseconds to the request! Knowing the average latency is important, but even more important is also knowing the 95th to 99 th percentile. While an average latency might yield only 60 milliseconds (which is good), the 95th percentile might be 400 milliseconds! It’s usually this “long tail” latency that will kill performance in a distributed architecture. In most cases, architects can get latency values from a network administrator (see “Fallacy #6: There Is Only One Administrator” on page 129). 在使用任何分布式架构时,架构师必须了解这个延迟平均值。这是确定分布式架构是否可行的唯一方法,特别是在考虑微服务时(见第 17 章),因为服务的细粒度特性和这些服务之间的通信量。假设每个请求的平均延迟为 100 毫秒,将 10 个服务调用串联在一起以执行特定的业务功能会增加 1,000 毫秒的请求时间!了解平均延迟很重要,但更重要的是了解第 95 到 99 百分位数。虽然平均延迟可能仅为 60 毫秒(这很好),但第 95 百分位数可能为 400 毫秒!通常是这种“长尾”延迟会在分布式架构中影响性能。在大多数情况下,架构师可以从网络管理员那里获取延迟值(见第 129 页“谬论#6:只有一个管理员”)。
Fallacy #3: Bandwidth Is Infinite 谬论 #3:带宽是无限的
Figure 9-4. Bandwidth is not infinite 图 9-4. 带宽不是无限的
Bandwidth is usually not a concern in monolithic architectures, because once processing goes into a monolith, little or no bandwidth is required to process that business request. However, as shown in Figure 9-4, once systems are broken apart into smaller deployment units (services) in a distributed architecture such as microservices, communication to and between these services significantly utilizes bandwidth, causing networks to slow down, thus impacting latency (fallacy #2) and reliability (fallacy #1). 在单体架构中,带宽通常不是一个问题,因为一旦处理进入单体,处理该业务请求所需的带宽很少或没有。然而,如图 9-4 所示,一旦系统被拆分为较小的部署单元(服务)在分布式架构中,例如微服务,这些服务之间的通信会显著利用带宽,导致网络变慢,从而影响延迟(谬论#2)和可靠性(谬论#1)。
To illustrate the importance of this fallacy, consider the two services shown in Figure 9-4. Let’s say the lefthand service manages the wish list items for the website, and the righthand service manages the customer profile. Whenever a request for a wish list comes into the lefthand service, it must make an interservice call to the righthand customer profile service to get the customer name because that data is needed in the response contract for the wish list, but the wish list service on the lefthand side doesn’t have the name. The customer profile service returns 45 attributes totaling 500 kb to the wish list service, which only needs the name ( 200 bytes). This is a form of coupling referred to as stamp coupling. This may not sound significant, but requests for the wish list items happen about 2,000 times a second. This means that this interservice call from the wish list service to the customer profile service happens 2,000 times a second. At 500 kb for each request, the amount of bandwidth used for that one interservice call (out of hundreds being made that second) is 1 Gb ! 为了说明这种谬论的重要性,考虑图 9-4 中显示的两个服务。假设左侧的服务管理网站的愿望清单项目,而右侧的服务管理客户档案。每当左侧服务收到愿望清单请求时,它必须向右侧的客户档案服务发起一次服务间调用,以获取客户姓名,因为该数据在愿望清单的响应合同中是必需的,但左侧的愿望清单服务没有姓名。客户档案服务返回 45 个属性,总计 500 kb 给愿望清单服务,而愿望清单服务只需要姓名(200 字节)。这是一种称为印章耦合的耦合形式。这听起来可能不重要,但愿望清单项目的请求大约每秒发生 2000 次。这意味着愿望清单服务到客户档案服务的这种服务间调用每秒发生 2000 次。每个请求 500 kb,这一次服务间调用所使用的带宽(在那一秒内进行的数百次调用中)是 1 Gb!
Stamp coupling in distributed architectures consumes significant amounts of bandwidth. If the customer profile service were to only pass back the data needed by the wish list service (in this case 200 bytes), the total bandwidth used to transmit the data is only 400 kb . Stamp coupling can be resolved in the following ways: 在分布式架构中,印章耦合消耗了大量带宽。如果客户档案服务仅传递愿望清单服务所需的数据(在这种情况下为 200 字节),则传输数据所使用的总带宽仅为 400 kb。印章耦合可以通过以下方式解决:
Create private RESTful API endpoints 创建私有 RESTful API 端点
Use field selectors in the contract 在合同中使用字段选择器
Use GraphQL to decouple contracts 使用 GraphQL 解耦合同
Use value-driven contracts with consumer-driven contracts (CDCs) 使用以价值为驱动的合同与以消费者为驱动的合同(CDC)
Use internal messaging endpoints 使用内部消息传递端点
Regardless of the technique used, ensuring that the minimal amount of data is passed between services or systems in a distributed architecture is the best way to address this fallacy. 无论使用何种技术,确保在分布式架构中服务或系统之间传递的最小数据量是解决这一谬论的最佳方法。
Fallacy #4: The Network Is Secure 谬论 #4:网络是安全的
Figure 9-5. The network is not secure 图 9-5. 网络不安全
Most architects and developers get so comfortable using virtual private networks (VPNs), trusted networks, and firewalls that they tend to forget about this fallacy of distributed computing: the network is not secure. Security becomes much more challenging in a distributed architecture. As shown in Figure 9-5, each and every endpoint to each distributed deployment unit must be secured so that unknown or bad requests do not make it to that service. The surface area for threats and attacks increases by magnitudes when moving from a monolithic to a distributed architecture. Having to secure every endpoint, even when doing interservice communication, is another reason performance tends to be slower in synchronous, highly-distributed architectures such as microservices or service-based architecture. 大多数架构师和开发人员在使用虚拟私人网络(VPN)、受信网络和防火墙时变得非常自如,以至于他们往往会忘记分布式计算的这个谬论:网络并不安全。在分布式架构中,安全性变得更加具有挑战性。如图 9-5 所示,每个分布式部署单元的每个端点都必须得到保护,以确保未知或恶意请求不会到达该服务。当从单体架构转向分布式架构时,威胁和攻击的表面面积会增加几个数量级。即使在进行服务间通信时,必须保护每个端点也是导致同步、高度分布式架构(如微服务或基于服务的架构)性能往往较慢的另一个原因。
Fallacy #5: The Topology Never Changes 谬论 #5:拓扑永远不会改变
Figure 9-6. The network topology always changes 图 9-6. 网络拓扑总是变化的
This fallacy refers to the overall network topology, including all of the routers, hubs, switches, firewalls, networks, and appliances used within the overall network. Architects assume that the topology is fixed and never changes. Of course it changes. It changes all the time. What is the significance of this fallacy? 这个谬论指的是整体网络拓扑,包括在整个网络中使用的所有路由器、集线器、交换机、防火墙、网络和设备。架构师假设拓扑是固定的,永远不会改变。当然,它是会改变的。它一直在变化。这个谬论的意义是什么?
Suppose an architect comes into work on a Monday morning, and everyone is running around like crazy because services keep timing out in production. The architect works with the teams, frantically trying to figure out why this is happening. No new services were deployed over the weekend. What could it be? After several hours the architect discovers that a minor network upgrade happened at 2 a.m. that morning. This supposedly “minor” network upgrade invalidated all of the latency assumptions, triggering timeouts and circuit breakers. 假设一位架构师在一个星期一的早晨来到工作岗位,发现大家都在疯狂地四处奔跑,因为生产环境中的服务不断超时。架构师与团队合作,拼命试图弄清楚为什么会发生这种情况。周末没有部署新的服务。这可能是什么原因呢?经过几个小时,架构师发现当天早上 2 点进行了一次小型网络升级。这次所谓的“微小”网络升级使所有延迟假设失效,触发了超时和断路器。
Architects must be in constant communication with operations and network administrators to know what is changing and when so that they can make adjustments accordingly to reduce the type of surprise previously described. This may seem obvious and easy, but it is not. As a matter of fact, this fallacy leads directly to the next fallacy. 架构师必须与运营和网络管理员保持持续沟通,以了解何时发生变化,以便他们可以相应地进行调整,从而减少之前所描述的惊讶类型。这看起来可能显而易见且简单,但实际上并非如此。事实上,这种谬论直接导致了下一个谬论。
Fallacy #6: There Is Only One Administrator 谬论 #6:只有一个管理员
Figure 9-7. There are many network administrators, not just one 图 9-7. 有许多网络管理员,而不仅仅是一个
Architects all the time fall into this fallacy, assuming they only need to collaborate and communicate with one administrator. As shown in Figure 9-7, there are dozens of network administrators in a typical large company. Who should the architect talk to with regard to latency (“Fallacy #2: Latency Is Zero” on page 125) or topology changes (“Fallacy #5: The Topology Never Changes” on page 128)? This fallacy points to the complexity of distributed architecture and the amount of coordination that must happen to get everything working correctly. Monolithic applications do not require this level of communication and collaboration due to the single deployment unit characteristics of those architecture styles. 架构师总是陷入这种谬论,假设他们只需要与一个管理员合作和沟通。如图 9-7 所示,在一个典型的大公司中,有数十个网络管理员。架构师应该与谁讨论延迟(第 125 页“谬论#2:延迟为零”)或拓扑变化(第 128 页“谬论#5:拓扑永远不变”)?这种谬论指出了分布式架构的复杂性以及为了使一切正常工作所需的协调量。单体应用程序由于这些架构风格的单一部署单元特性,不需要这种级别的沟通和协作。
Fallacy #7: Transport Cost Is Zero 谬论 #7:运输成本为零
Figure 9-8. Remote access costs money 图 9-8. 远程访问需要花费金钱
Many software architects confuse this fallacy for latency (“Fallacy #2: Latency Is Zero” on page 125). Transport cost here does not refer to latency, but rather to actual cost in terms of money associated with making a “simple RESTful call.” Architects assume (incorrectly) that the necessary infrastructure is in place and sufficient for making a simple RESTful call or breaking apart a monolithic application. It is usually not. Distributed architectures cost significantly more than monolithic architectures, primarily due to increased needs for additional hardware, servers, gateways, firewalls, new subnets, proxies, and so on. 许多软件架构师将这种谬论与延迟混淆(“谬论 #2:延迟为零”在第 125 页)。这里的传输成本并不是指延迟,而是指与进行“简单的 RESTful 调用”相关的实际金钱成本。架构师错误地假设必要的基础设施已经到位,并且足以进行简单的 RESTful 调用或拆分单体应用程序。实际上,通常并非如此。分布式架构的成本显著高于单体架构,主要是由于对额外硬件、服务器、网关、防火墙、新子网、代理等的需求增加。
Whenever embarking on a distributed architecture, we encourage architects to analyze the current server and network topology with regard to capacity, bandwidth, latency, and security zones to not get caught up in the trap of surprise with this fallacy. 在开始分布式架构时,我们鼓励架构师分析当前的服务器和网络拓扑,考虑容量、带宽、延迟和安全区域,以避免陷入这种谬论带来的意外陷阱。
Fallacy #8: The Network Is Homogeneous 谬论 #8:网络是同质的
Figure 9-9. The network is not homogeneous 图 9-9. 网络不是同质的
Most architects and developers assume a network is homogeneous-made up by only one network hardware vendor. Nothing could be farther from the truth. Most companies have multiple network hardware vendors in their infrastructure, if not more. 大多数架构师和开发人员假设网络是同质的——仅由一个网络硬件供应商构成。事实远非如此。大多数公司在其基础设施中有多个网络硬件供应商,甚至更多。
So what? The significance of this fallacy is that not all of those heterogeneous hardware vendors play together well. Most of it works, but does Juniper hardware seamlessly integrate with Cisco hardware? Networking standards have evolved over the years, making this less of an issue, but the fact remains that not all situations, load, and circumstances have been fully tested, and as such, network packets occasionally get lost. This in turn impacts network reliability (“Fallacy #1: The Network Is Reliable” on page 124), latency assumptions and assertions (“Fallacy #2: Latency Is Zero” on page 125), and assumptions made about the bandwidth (“Fallacy #3: Bandwidth Is Infinite” on page 126). In other words, this fallacy ties back into all of the other fallacies, forming an endless loop of confusion and frustration when dealing with networks (which is necessary when using distributed architectures). 那又怎样?这个谬论的意义在于,并非所有异构硬件供应商都能很好地协同工作。大多数情况下是可行的,但 Juniper 硬件是否能与 Cisco 硬件无缝集成?多年来,网络标准不断发展,使这一问题有所减轻,但事实仍然是,并非所有情况、负载和环境都经过充分测试,因此,网络数据包偶尔会丢失。这反过来影响了网络的可靠性(“谬论 #1:网络是可靠的”在第 124 页)、延迟假设和断言(“谬论 #2:延迟为零”在第 125 页)以及关于带宽的假设(“谬论 #3:带宽是无限的”在第 126 页)。换句话说,这个谬论与所有其他谬论相互关联,形成了在处理网络时的无尽混乱和挫折(在使用分布式架构时这是必要的)。
Other Distributed Considerations 其他分布式考虑事项
In addition to the eight fallacies of distributed computing previously described, there are other issues and challenges facing distributed architecture that aren’t present in monolithic architectures. Although the details of these other issues are out of scope for this book, we list and summarize them in the following sections. 除了之前描述的分布式计算的八个谬论外,分布式架构还面临着一些在单体架构中不存在的其他问题和挑战。尽管这些其他问题的细节超出了本书的范围,但我们在以下章节中列出并总结了它们。
Distributed logging 分布式日志记录
Performing root-cause analysis to determine why a particular order was dropped is very difficult and time-consuming in a distributed architecture due to the distribution of application and system logs. In a monolithic application there is typically only one log, making it easier to trace a request and determine the issue. However, distributed architectures contain dozens to hundreds of different logs, all located in a 在分布式架构中,由于应用程序和系统日志的分布,进行根本原因分析以确定为什么特定订单被丢弃是非常困难且耗时的。在单体应用程序中,通常只有一个日志,这使得追踪请求和确定问题变得更容易。然而,分布式架构包含数十到数百个不同的日志,所有这些日志都位于一个
different place and all with a different format, making it difficult to track down a problem. 不同的地方,且格式各异,这使得追踪问题变得困难。
Logging consolidation tools such as Splunk help to consolidate information from various sources and systems together into one consolidated log and console, but these tools only scratch the surface of the complexities involved with distributed logging. Detailed solutions and patterns for distributed logging are outside the scope of this book. 日志整合工具如 Splunk 有助于将来自各种来源和系统的信息整合到一个统一的日志和控制台中,但这些工具仅仅触及了分布式日志记录所涉及的复杂性。分布式日志记录的详细解决方案和模式超出了本书的范围。
Distributed transactions 分布式事务
Architects and developers take transactions for granted in a monolithic architecture world because they are so straightforward and easy to manage. Standard commits and rollbacks executed from persistence frameworks leverage ACID (atomicity, consistency, isolation, durability) transactions to guarantee that the data is updated in a correct way to ensure high data consistency and integrity. Such is not the case with distributed architectures. 在单体架构的世界中,架构师和开发人员理所当然地认为事务是简单易管理的。来自持久性框架的标准提交和回滚利用 ACID(原子性、一致性、隔离性、持久性)事务来保证数据以正确的方式更新,以确保高数据一致性和完整性。然而,分布式架构并非如此。
Distributed architectures rely on what is called eventual consistency to ensure the data processed by separate deployment units is at some unspecified point in time all synchronized into a consistent state. This is one of the trade-offs of distributed architecture: high scalability, performance, and availability at the sacrifice of data consistency and data integrity. 分布式架构依赖于所谓的最终一致性,以确保由不同部署单元处理的数据在某个未指定的时间点上都同步到一致的状态。这是分布式架构的一个权衡:在牺牲数据一致性和数据完整性的情况下,实现高可扩展性、性能和可用性。
Transactional sagas are one way to manage distributed transactions. Sagas utilize either event sourcing for compensation or finite state machines to manage the state of transaction. In addition to sagas, BASE transactions are used. BASE stands for (B)asic availability, (S)oft state, and (E)ventual consistency. BASE transactions are not a piece of software, but rather a technique. Soft state in BASE refers to the transit of data from a source to a target, as well as the inconsistency between data sources. Based on the basic availability of the systems or services involved, the systems will eventually become consistent through the use of architecture patterns and messaging. 事务性长事务是一种管理分布式事务的方法。长事务利用事件源进行补偿或有限状态机来管理事务的状态。除了长事务,还使用 BASE 事务。BASE 代表(B)基本可用性,(S)软状态和(E)最终一致性。BASE 事务不是一段软件,而是一种技术。BASE 中的软状态指的是数据从源到目标的传输,以及数据源之间的不一致性。基于所涉及系统或服务的基本可用性,这些系统最终将通过使用架构模式和消息传递变得一致。
Contract maintenance and versioning 合同维护和版本控制
Another particularly difficult challenge within distributed architecture is contract creation, maintenance, and versioning. A contract is behavior and data that is agreed upon by both the client and the service. Contract maintenance is particularly difficult in distributed architectures, primarily due to decoupled services and systems owned by different teams and departments. Even more complex are the communication models needed for version deprecation. 在分布式架构中,另一个特别困难的挑战是合同的创建、维护和版本管理。合同是客户端和服务端共同达成的行为和数据。在分布式架构中,合同维护尤其困难,主要是由于服务和系统的解耦,以及由不同团队和部门拥有的系统。版本弃用所需的通信模型更为复杂。
Layered Architecture Style 分层架构风格
The layered architecture, also known as the nn-tiered architecture style, is one of the most common architecture styles. This style of architecture is the de facto standard for most applications, primarily because of its simplicity, familiarity, and low cost. It is also a very natural way to develop applications due to Conway’s law, which states that organizations that design systems are constrained to produce designs which are copies of the communication structures of these organizations. In most organizations there are user interface (UI) developers, backend developers, rules developers, and database experts (DBAs). These organizational layers fit nicely into the tiers of a traditional layered architecture, making it a natural choice for many business applications. The layered architecture style also falls into several architectural anti-patterns, including the architecture by implication anti-pattern and the accidental architecture anti-pattern. If a developer or architect is unsure which architecture style they are using, or if an Agile development team “just starts coding,” chances are good that it is the layered architecture style they are implementing. 分层架构,也称为 nn -层架构风格,是最常见的架构风格之一。这种架构风格是大多数应用程序的事实标准,主要是因为它的简单性、熟悉性和低成本。由于康威定律,它也是开发应用程序的一种非常自然的方式,该定律指出,设计系统的组织受到限制,必须产生与这些组织的沟通结构相似的设计。在大多数组织中,有用户界面(UI)开发人员、后端开发人员、规则开发人员和数据库专家(DBA)。这些组织层次很好地适应传统分层架构的层级,使其成为许多业务应用程序的自然选择。分层架构风格还落入几个架构反模式,包括隐含架构反模式和意外架构反模式。如果开发人员或架构师不确定他们正在使用哪种架构风格,或者如果敏捷开发团队“只是开始编码”,那么他们实施的很可能就是分层架构风格。
Topology 拓扑
Components within the layered architecture style are organized into logical horizontal layers, with each layer performing a specific role within the application (such as presentation logic or business logic). Although there are no specific restrictions in terms of the number and types of layers that must exist, most layered architectures consist of four standard layers: presentation, business, persistence, and database, as illustrated in Figure 10-1. In some cases, the business layer and persistence layer are combined into a single business layer, particularly when the persistence logic (such as SQL or HSQL) is embedded within the business layer components. Thus, smaller applications may have only three layers, whereas larger and more complex business applications may contain five or more layers. 分层架构风格中的组件被组织成逻辑水平层,每个层在应用程序中执行特定的角色(例如表示逻辑或业务逻辑)。虽然在层的数量和类型上没有具体的限制,但大多数分层架构由四个标准层组成:表示层、业务层、持久层和数据库层,如图 10-1 所示。在某些情况下,业务层和持久层合并为一个业务层,特别是当持久逻辑(例如 SQL 或 HSQL)嵌入在业务层组件中时。因此,较小的应用程序可能只有三个层,而较大和更复杂的业务应用程序可能包含五个或更多层。
Figure 10-1. Standard logical layers within the layered architecture style 图 10-1. 分层架构风格中的标准逻辑层
Figure 10-2 illustrates the various topology variants from a physical layering (deployment) perspective. The first variant combines the presentation, business, and persistence layers into a single deployment unit, with the database layer typically represented as a separate external physical database (or filesystem). The second variant physically separates the presentation layer into its own deployment unit, with the business and persistence layers combined into a second deployment unit. Again, with this variant, the database layer is usually physically separated through an external database or filesystem. A third variant combines all four standard layers into a single deployment, including the database layer. This variant might be useful for smaller applications with either an internally embedded database or an in-memory database. Many on-premises (“on-prem”) products are built and delivered to customers using this third variant. 图 10-2 从物理层(部署)角度展示了各种拓扑变体。第一个变体将表示层、业务层和持久层合并为一个单一的部署单元,数据库层通常表示为一个单独的外部物理数据库(或文件系统)。第二个变体将表示层物理分离为其自己的部署单元,业务层和持久层合并为第二个部署单元。同样,在这个变体中,数据库层通常通过外部数据库或文件系统物理分离。第三个变体将所有四个标准层合并为一个单一的部署,包括数据库层。这个变体可能对具有内部嵌入式数据库或内存数据库的小型应用程序有用。许多本地(“on-prem”)产品都是使用这个第三个变体构建并交付给客户的。
Each layer of the layered architecture style has a specific role and responsibility within the architecture. For example, the presentation layer would be responsible for handling all user interface and browser communication logic, whereas the business layer would be responsible for executing specific business rules associated with the 分层架构风格的每一层在架构中都有特定的角色和责任。例如,表示层负责处理所有用户界面和浏览器通信逻辑,而业务层则负责执行与特定业务规则相关的内容。
request. Each layer in the architecture forms an abstraction around the work that needs to be done to satisfy a particular business request. For example, the presentation layer doesn’t need to know or worry about how to get customer data; it only needs to display that information on a screen in a particular format. Similarly, the business layer doesn’t need to be concerned about how to format customer data for display on a screen or even where the customer data is coming from; it only needs to get the data from the persistence layer, perform business logic against the data (such as calculating values or aggregating data), and pass that information up to the presentation layer. 请求。架构中的每一层都围绕满足特定业务请求所需完成的工作形成一个抽象。例如,表示层不需要知道或担心如何获取客户数据;它只需要以特定格式在屏幕上显示该信息。同样,业务层不需要关心如何格式化客户数据以在屏幕上显示,甚至不需要关心客户数据来自哪里;它只需要从持久层获取数据,对数据执行业务逻辑(例如计算值或聚合数据),并将该信息传递给表示层。
This separation of concerns concept within the layered architecture style makes it easy to build effective roles and responsibility models within the architecture. Components within a specific layer are limited in scope, dealing only with the logic that pertains to that layer. For example, components in the presentation layer only handle presentation logic, whereas components residing in the business layer only handle business logic. This allows developers to leverage their particular technical expertise to focus on the technical aspects of the domain (such as presentation logic or persistence logic). The trade-off of this benefit, however, is a lack of overall agility (the ability to respond quickly to change). 分层架构风格中的关注点分离概念使得在架构中构建有效的角色和责任模型变得容易。特定层中的组件范围有限,仅处理与该层相关的逻辑。例如,表示层中的组件仅处理表示逻辑,而位于业务层的组件仅处理业务逻辑。这使得开发人员能够利用他们特定的技术专长,专注于领域的技术方面(例如表示逻辑或持久性逻辑)。然而,这种好处的权衡是整体灵活性不足(快速响应变化的能力)。
The layered architecture is a technically partitioned architecture (as opposed to a domain-partitioned architecture). Groups of components, rather than being grouped by domain (such as customer), are grouped by their technical role in the architecture (such as presentation or business). As a result, any particular business domain is spread throughout all of the layers of the architecture. For example, the domain of “customer” is contained in the presentation layer, business layer, rules layer, services layer, and database layer, making it difficult to apply changes to that domain. As a result, a domain-driven design approach does not work as well with the layered architecture style. 分层架构是一种技术上划分的架构(与领域划分的架构相对)。组件组不是按领域(例如客户)分组,而是按其在架构中的技术角色(例如表示层或业务层)分组。因此,任何特定的业务领域都分布在架构的所有层中。例如,“客户”领域包含在表示层、业务层、规则层、服务层和数据库层中,这使得对该领域进行更改变得困难。因此,领域驱动设计方法在分层架构风格中效果不佳。
Layers of Isolation 隔离层
Each layer in the layered architecture style can be either closed or open. A closed layer means that as a request moves top-down from layer to layer, the request cannot skip any layers, but rather must go through the layer immediately below it to get to the next layer (see Figure 10-3). For example, in a closed-layered architecture, a request originating from the presentation layer must first go through the business layer and then to the persistence layer before finally making it to the database layer. 分层架构风格中的每一层可以是封闭的或开放的。封闭层意味着请求在层与层之间自上而下移动时,不能跳过任何层,而必须通过其下方的层才能到达下一层(见图 10-3)。例如,在封闭层架构中,来自表示层的请求必须首先经过业务层,然后到达持久层,最后才能到达数据库层。
Figure 10-3. Closed layers within the layered architecture 图 10-3. 分层架构中的封闭层
Notice that in Figure 10-3 it would be much faster and easier for the presentation layer to access the database directly for simple retrieval requests, bypassing any unnecessary layers (what used to be known in the early 2000s as the fast-lane reader pattern). For this to happen, the business and persistence layers would have to be open, allowing requests to bypass other layers. Which is better-open layers or closed layers? The answer to this question lies in a key concept known as layers of isolation. 请注意,在图 10-3 中,对于简单的检索请求,演示层直接访问数据库会更快更容易,绕过任何不必要的层(在 2000 年代早期被称为快速通道读取器模式)。为了实现这一点,业务层和持久层必须是开放的,允许请求绕过其他层。哪种更好——开放层还是封闭层?这个问题的答案在于一个被称为隔离层的关键概念。
The layers of isolation concept means that changes made in one layer of the architecture generally don’t impact or affect components in other layers, providing the contracts between those layers remain unchanged. Each layer is independent of the other layers, thereby having little or no knowledge of the inner workings of other layers in the architecture. However, to support layers of isolation, layers involved with the major flow of the request necessarily have to be closed. If the presentation layer can directly access the persistence layer, then changes made to the persistence layer would impact both the business layer and the presentation layer, producing a very tightly coupled application with layer interdependencies between components. This type of architecture then becomes very brittle, as well as difficult and expensive to change. 隔离层的概念意味着在架构的一个层中所做的更改通常不会影响或影响其他层中的组件,前提是这些层之间的契约保持不变。每一层都是独立于其他层的,因此对架构中其他层的内部工作几乎没有或没有了解。然而,为了支持隔离层,涉及请求主要流程的层必须是封闭的。如果表示层可以直接访问持久层,那么对持久层所做的更改将影响业务层和表示层,从而产生一个层之间高度耦合的应用程序,组件之间存在层间依赖关系。这种类型的架构变得非常脆弱,并且难以且昂贵地进行更改。
The layers of isolation concept also allows any layer in the architecture to be replaced without impacting any other layer (again, assuming well-defined contracts and the use of the business delegate pattern). For example, you can leverage the layers of isolation concept within the layered architecture style to replace your older JavaServer Faces (JSF) presentation layer with React.js without impacting any other layer in the application. 隔离层的概念还允许架构中的任何层被替换,而不影响其他任何层(再次假设有明确的合同和使用业务委托模式)。例如,您可以在分层架构风格中利用隔离层的概念,将旧的 JavaServer Faces (JSF) 表现层替换为 React.js,而不影响应用程序中的其他层。
Adding Layers 添加层次
While closed layers facilitate layers of isolation and therefore help isolate change within the architecture, there are times when it makes sense for certain layers to be open. For example, suppose there are shared objects within the business layer that contain common functionality for business components (such as date and string util- 虽然封闭层促进了隔离层的建立,从而有助于在架构中隔离变更,但有时某些层开放是有意义的。例如,假设在业务层中有共享对象,包含业务组件的公共功能(例如日期和字符串工具)。
ity classes, auditing classes, logging classes, and so on). Suppose there is an architecture decision stating that the presentation layer is restricted from using these shared business objects. This constraint is illustrated in Figure 10-4, with the dotted line going from a presentation component to a shared business object in the business layer. This scenario is difficult to govern and control because architecturally the presentation layer has access to the business layer, and hence has access to the shared objects within that layer. 业务类、审计类、日志类等)。假设有一个架构决策,规定表示层不能使用这些共享业务对象。这个约束在图 10-4 中得到了说明,虚线从一个表示组件指向业务层中的共享业务对象。这个场景难以管理和控制,因为从架构上讲,表示层可以访问业务层,因此可以访问该层中的共享对象。
Figure 10-4. Shared objects within the business layer 图 10-4. 业务层中的共享对象
One way to architecturally mandate this restriction is to add to the architecture a new services layer containing all of the shared business objects. Adding this new layer now architecturally restricts the presentation layer from accessing the shared business objects because the business layer is closed (see Figure 10-5). However, the new services layer must be marked as open; otherwise the business layer would be forced to go through the services layer to access the persistence layer. Marking the services layer as open allows the business layer to either access that layer (as indicated by the solid arrow), or bypass the layer and go to the next one down (as indicated by the dotted arrow in Figure 10-5). 一种在架构上强制执行此限制的方法是向架构中添加一个新的服务层,其中包含所有共享的业务对象。添加这个新层现在在架构上限制了表示层访问共享的业务对象,因为业务层是封闭的(见图 10-5)。然而,新的服务层必须标记为开放;否则,业务层将被迫通过服务层访问持久层。将服务层标记为开放允许业务层访问该层(如实线箭头所示),或绕过该层直接访问下一个层(如图 10-5 中的虚线箭头所示)。
Figure 10-5. Adding a new services layer to the architecture 图 10-5. 向架构添加新的服务层
Leveraging the concept of open and closed layers helps define the relationship between architecture layers and request flows. It also provides developers with the necessary information and guidance to understand various layer access restrictions within the architecture. Failure to document or properly communicate which layers in the architecture are open and closed (and why) usually results in tightly coupled and brittle architectures that are very difficult to test, maintain, and deploy. 利用开放和封闭层的概念有助于定义架构层与请求流之间的关系。它还为开发人员提供了必要的信息和指导,以理解架构中各种层的访问限制。未能记录或正确传达架构中哪些层是开放的,哪些层是封闭的(以及原因),通常会导致紧密耦合和脆弱的架构,这使得测试、维护和部署变得非常困难。
Other Considerations 其他考虑事项
The layered architecture makes for a good starting point for most applications when it is not known yet exactly which architecture style will ultimately be used. This is a common practice for many microservices efforts when architects are still determining whether microservices is the right architecture choice, but development must begin. However, when using this technique, be sure to keep reuse at a minimum and keep object hierarchies (depth of inheritance tree) fairly shallow so as to maintain a good level of modularity. This will help facilitate the move to another architecture style later on. 分层架构为大多数应用程序提供了一个良好的起点,当尚不确定最终将使用哪种架构风格时。这是许多微服务项目中的一种常见做法,当架构师仍在确定微服务是否是正确的架构选择时,但开发必须开始。然而,在使用这种技术时,请确保最小化重用,并保持对象层次结构(继承树的深度)相对较浅,以维持良好的模块化水平。这将有助于以后迁移到另一种架构风格。
One thing to watch out for with the layered architecture is the architecture sinkhole anti-pattern. This anti-pattern occurs when requests move from layer to layer as simple pass-through processing with no business logic performed within each layer. For example, suppose the presentation layer responds to a simple request from the user to retrieve basic customer data (such as name and address). The presentation layer 需要注意分层架构的一个问题是架构下沉反模式。该反模式发生在请求在各层之间移动,作为简单的传递处理,而每层内没有执行任何业务逻辑。例如,假设表示层响应用户的简单请求,以检索基本客户数据(例如姓名和地址)。表示层
passes the request to the business layer, which does nothing but pass the request on to the rules layer, which in turn does nothing but pass the request on to the persistence layer, which then makes a simple SQL call to the database layer to retrieve the customer data. The data is then passed all the way back up the stack with no additional processing or logic to aggregate, calculate, apply rules, or transform the data. This results in unnecessary object instantiation and processing, impacting both memory consumption and performance. 将请求传递给业务层,业务层只将请求传递给规则层,规则层又只将请求传递给持久层,持久层然后向数据库层发出简单的 SQL 调用以检索客户数据。数据随后被传递回整个堆栈,没有额外的处理或逻辑来聚合、计算、应用规则或转换数据。这导致了不必要的对象实例化和处理,影响了内存消耗和性能。
Every layered architecture will have at least some scenarios that fall into the architecture sinkhole anti-pattern. The key to determining whether the architecture sinkhole anti-pattern is at play is to analyze the percentage of requests that fall into this category. The 80-2080-20 rule is usually a good practice to follow. For example, it is acceptable if only 20 percent of the requests are sinkholes. However, if 80 percent of the requests are sinkholes, it a good indicator that the layered architecture is not the correct architecture style for the problem domain. Another approach to solving the architecture sinkhole anti-pattern is to make all the layers in the architecture open, realizing, of course, that the trade-off is increased difficulty in managing change within the architecture. 每个分层架构至少会有一些场景落入架构陷阱反模式。确定架构陷阱反模式是否存在的关键是分析落入此类别的请求百分比。 80-2080-20 规则通常是一个好的实践。例如,如果只有 20%的请求是陷阱请求,那是可以接受的。然而,如果 80%的请求是陷阱请求,这就很好的指示了分层架构并不是该问题领域的正确架构风格。解决架构陷阱反模式的另一种方法是使架构中的所有层都是开放的,当然要意识到,这样做的权衡是增加了在架构中管理变更的难度。
Why Use This Architecture Style 为什么使用这种架构风格
The layered architecture style is a good choice for small, simple applications or websites. It is also a good architecture choice, particularly as a starting point, for situations with very tight budget and time constraints. Because of the simplicity and familiarity among developers and architects, the layered architecture is perhaps one of the lowest-cost architecture styles, promoting ease of development for smaller applications. The layered architecture style is also a good choice when an architect is still analyzing business needs and requirements and is unsure which architecture style would be best. 分层架构风格是小型、简单应用程序或网站的一个不错选择。它也是一个好的架构选择,特别是在预算和时间限制非常紧张的情况下,作为起点。由于其简单性和开发人员及架构师之间的熟悉度,分层架构可能是成本最低的架构风格之一,促进了小型应用程序的开发便利性。当架构师仍在分析业务需求和要求,并不确定哪种架构风格最合适时,分层架构风格也是一个不错的选择。
As applications using the layered architecture style grow, characteristics like maintainability, agility, testability, and deployability are adversely affected. For this reason, large applications and systems using the layered architecture might be better suited for other, more modular architecture styles. 随着使用分层架构风格的应用程序的增长,维护性、灵活性、可测试性和可部署性等特性受到不利影响。因此,使用分层架构的大型应用程序和系统可能更适合其他更模块化的架构风格。
Architecture Characteristics Ratings 架构特性评级
A one-star rating in the characteristics ratings table (shown in Figure 10-6) means the specific architecture characteristic isn’t well supported in the architecture, whereas a five-star rating means the architecture characteristic is one of the strongest features in the architecture style. The definition for each characteristic identified in the scorecard can be found in Chapter 4. 在特征评分表中(如图 10-6 所示)的一星评级意味着特定的架构特征在该架构中支持不佳,而五星评级则意味着该架构特征是该架构风格中最强的特征之一。评分卡中识别的每个特征的定义可以在第 4 章找到。
Figure 10-6.Layered architecture characteristics ratings 图 10-6.分层架构特征评分
Overall cost and simplicity are the primary strengths of the layered architecture style. Being monolithic in nature,layered architectures don't have the complexities associ- ated with distributed architecture styles,are simple and easy to understand,and are relatively low cost to build and maintain.However,as a cautionary note,these ratings start to quickly diminish as monolithic layered architectures get bigger and conse- quently more complex. 分层架构风格的主要优点是整体成本和简单性。由于其单体特性,分层架构没有与分布式架构风格相关的复杂性,简单易懂,构建和维护的成本相对较低。然而,作为一个警示,这些优点在单体分层架构变得更大并因此变得更复杂时,迅速减弱
Both deployability and testability rate very low for this architecture style.Deployabil- ity rates low due to the ceremony of deployment(effort to deploy),high risk,and lack of frequent deployments.A simple three-line change to a class file in the layered architecture style requires the entire deployment unit to be redeployed,taking in potential database changes,configuration changes,or other coding changes sneaking in alongside the original change.Furthermore,this simple three-line change is usu- ally bundled with dozens of other changes,thereby increasing deployment risk even further(as well as increasing the frequency of deployment).The low testability rating 这种架构风格的可部署性和可测试性评分都非常低。可部署性评分低是由于部署的繁琐(部署所需的努力)、高风险以及缺乏频繁的部署。在分层架构风格中,对类文件进行简单的三行更改需要重新部署整个部署单元,这可能涉及数据库更改、配置更改或其他与原始更改一起潜入的编码更改。此外,这个简单的三行更改通常与其他数十个更改捆绑在一起,从而进一步增加了部署风险(以及增加了部署的频率)。低可测试性评分
also reflects this scenario; with a simple three-line change, most developers are not going to spend hours executing the entire regression test suite (even if such a thing were to exist in the first place), particularly along with dozens of other changes being made to the monolithic application at the same time. We gave testability a two-star rating (rather than one star) due to the ability to mock or stub components (or even an entire layer), which eases the overall testing effort. 也反映了这种情况;通过简单的三行更改,大多数开发人员不会花费数小时执行整个回归测试套件(即使这样的东西首先存在),特别是在与同时对单体应用程序进行的数十个其他更改一起进行时。我们给可测试性打了两颗星的评分(而不是一颗星),因为能够模拟或存根组件(甚至整个层),这减轻了整体测试工作量。
Overall reliability rates medium (three stars) in this architecture style, mostly due to the lack of network traffic, bandwidth, and latency found in most distributed architectures. We only gave the layered architecture three stars for reliability because of the nature of the monolithic deployment, combined with the low ratings for testability (completeness of testing) and deployment risk. 在这种架构风格中,整体可靠性评级为中等(三颗星),主要是由于大多数分布式架构中缺乏网络流量、带宽和延迟。我们仅给分层架构三颗星的可靠性评级,因为单体部署的性质,加上可测试性(测试的完整性)和部署风险的低评级。
Elasticity and scalability rate very low (one star) for the layered architecture, primarily due to monolithic deployments and the lack of architectural modularity. Although it is possible to make certain functions within a monolith scale more than others, this effort usually requires very complex design techniques such as multithreading, internal messaging, and other parallel processing practices, techniques this architecture isn’t well suited for. However, because the layered architecture is always a single system quantum due to the monolithic user interface, backend processing, and monolithic database, applications can only scale to a certain point based on the single quantum. 分层架构的弹性和可扩展性评分非常低(一个星),主要是由于单体部署和缺乏架构模块化。尽管在单体中可以使某些功能的扩展性超过其他功能,但这一努力通常需要非常复杂的设计技术,如多线程、内部消息传递和其他并行处理实践,而这些技术并不适合这种架构。然而,由于分层架构始终是一个单一系统量子,原因在于单体用户界面、后端处理和单体数据库,应用程序只能根据单一量子扩展到某个点。
Performance is always an interesting characteristic to rate for the layered architecture. We gave it only two stars because the architecture style simply does not lend itself to high-performance systems due to the lack of parallel processing, closed layering, and the sinkhole architecture anti-pattern. Like scalability, performance can be addressed through caching, multithreading, and the like, but it is not a natural characteristic of this architecture style; architects and developers have to work hard to make all this happen. 性能始终是分层架构中一个有趣的特征。我们只给了它两个星级,因为这种架构风格根本不适合高性能系统,原因在于缺乏并行处理、封闭层次和漏斗架构反模式。与可扩展性一样,性能可以通过缓存、多线程等方式来解决,但这并不是这种架构风格的自然特征;架构师和开发人员必须付出很大努力才能实现这一切。
Layered architectures don’t support fault tolerance due to monolithic deployments and the lack of architectural modularity. If one small part of a layered architecture causes an out-of-memory condition to occur, the entire application unit is impacted and crashes. Furthermore, overall availability is impacted due to the high mean-time-to-recovery (MTTR) usually experienced by most monolithic applications, with startup times ranging anywhere from 2 minutes for smaller applications, up to 15 minutes or more for most large applications. 分层架构由于单体部署和缺乏架构模块化,不支持容错。如果分层架构的一个小部分导致出现内存不足的情况,整个应用单元都会受到影响并崩溃。此外,由于大多数单体应用通常经历的高平均恢复时间(MTTR),整体可用性也受到影响,启动时间从较小应用的 2 分钟到大多数大型应用的 15 分钟或更长时间不等。
Pipeline Architecture Style 管道架构风格
One of the fundamental styles in software architecture that appears again and again is the pipeline architecture (also known as the pipes and filters architecture). As soon as developers and architects decided to split functionality into discrete parts, this pattern followed. Most developers know this architecture as this underlying principle behind Unix terminal shell languages, such as Bash and Zsh. 软件架构中反复出现的基本风格之一是管道架构(也称为管道和过滤器架构)。一旦开发人员和架构师决定将功能拆分为离散部分,这种模式便随之而来。大多数开发人员都知道这种架构,因为它是 Unix 终端 shell 语言(如 Bash 和 Zsh)背后的基本原理。
Developers in many functional programming languages will see parallels between language constructs and elements of this architecture. In fact, many tools that utilize the MapReduce programming model follow this basic topology. While these examples show a low-level implementation of the pipeline architecture style, it can also be used for higher-level business applications. 许多函数式编程语言中的开发人员会看到语言构造与该架构元素之间的相似之处。事实上,许多利用 MapReduce 编程模型的工具遵循这种基本拓扑。虽然这些示例展示了管道架构风格的低级实现,但它也可以用于更高级的业务应用。
Topology 拓扑
The topology of the pipeline architecture consists of pipes and filters, illustrated in Figure 11-1. 管道架构的拓扑由管道和过滤器组成,如图 11-1 所示。
Figure 11-1. Basic topology for pipeline architecture 图 11-1. 管道架构的基本拓扑
The pipes and filters coordinate in a specific fashion, with pipes forming one-way communication between filters, usually in a point-to-point fashion. 管道和过滤器以特定方式协调,管道在过滤器之间形成单向通信,通常是点对点的方式。
Pipes 管道
Pipes in this architecture form the communication channel between filters. Each pipe is typically unidirectional and point-to-point (rather than broadcast) for performance reasons, accepting input from one source and always directing output to another. The payload carried on the pipes may be any data format, but architects favor smaller amounts of data to enable high performance. 在这种架构中,管道形成过滤器之间的通信通道。每个管道通常是单向的和点对点的(而不是广播的),出于性能考虑,从一个源接收输入,并始终将输出定向到另一个源。管道上承载的有效载荷可以是任何数据格式,但架构师更倾向于较小的数据量以实现高性能。
Filters 过滤器
Filters are self-contained, independent from other filters, and generally stateless. Filters should perform one task only. Composite tasks should be handled by a sequence of filters rather than a single one. 过滤器是自包含的,独立于其他过滤器,并且通常是无状态的。过滤器应该只执行一个任务。复合任务应该通过一系列过滤器来处理,而不是单个过滤器。
Four types of filters exist within this architecture style: 在这种架构风格中存在四种类型的过滤器:
Producer 生产者
The starting point of a process, outbound only, sometimes called the source. 一个过程的起点,仅限输出,有时称为源。
Transformer 变压器
Accepts input, optionally performs a transformation on some or all of the data, then forwards it to the outbound pipe. Functional advocates will recognize this feature as map. 接受输入,选择性地对部分或全部数据进行转换,然后将其转发到出站管道。功能倡导者会将此特性称为 map。
Tester 测试者
Accepts input, tests one or more criteria, then optionally produces output, based on the test. Functional programmers will recognize this as similar to reduce. 接受输入,测试一个或多个标准,然后根据测试可选地生成输出。函数式程序员会将其视为类似于 reduce。
Consumer 消费者
The termination point for the pipeline flow. Consumers sometimes persist the final result of the pipeline process to a database, or they may display the final results on a user interface screen. 管道流的终止点。消费者有时将管道过程的最终结果持久化到数据库中,或者可能在用户界面屏幕上显示最终结果。
The unidirectional nature and simplicity of each of the pipes and filters encourages compositional reuse. Many developers have discovered this ability using shells. A famous story from the blog “More Shell, Less Egg” illustrates just how powerful these abstractions are. Donald Knuth was asked to write a program to solve this text handling problem: read a file of text, determine the nn most frequently used words, and print out a sorted list of those words along with their frequencies. He wrote a program consisting of more than 10 pages of Pascal, designing (and documenting) a new algorithm along the way. Then, Doug McIlroy demonstrated a shell script that would easily fit within a Twitter post that solved the problem more simply, elegantly, and understandably (if you understand shell commands): 每个管道和过滤器的单向特性和简单性鼓励了组合重用。许多开发者通过使用 shell 发现了这种能力。来自博客“More Shell, Less Egg”的一个著名故事说明了这些抽象是多么强大。唐纳德·克努斯被要求编写一个程序来解决这个文本处理问题:读取一个文本文件,确定使用频率最高的 nn 个单词,并打印出这些单词及其频率的排序列表。他编写了一个超过 10 页的 Pascal 程序,在此过程中设计(并记录)了一个新算法。然后,道格·麦基尔罗伊演示了一个 shell 脚本,该脚本可以轻松地适应 Twitter 帖子,解决了这个问题,方式更简单、优雅且易于理解(如果你理解 shell 命令):
Even the designers of Unix shells are often surprised at the inventive uses developers have wrought with their simple but powerfully composite abstractions. 甚至连 Unix shell 的设计者也常常对开发者利用其简单但强大复合抽象所创造的创新用法感到惊讶。
Example 示例
The pipeline architecture pattern appears in a variety of applications, especially tasks that facilitate simple, one-way processing. For example, many Electronic Data Interchange (EDI) tools use this pattern, building transformations from one document type to another using pipes and filters. ETL tools (extract, transform, and load) leverage the pipeline architecture as well for the flow and modification of data from one database or data source to another. Orchestrators and mediators such as Apache Camel utilize the pipeline architecture to pass information from one step in a business process to another. 管道架构模式出现在各种应用中,特别是那些促进简单单向处理的任务。例如,许多电子数据交换(EDI)工具使用这种模式,通过管道和过滤器构建从一种文档类型到另一种文档类型的转换。ETL 工具(提取、转换和加载)也利用管道架构来实现数据从一个数据库或数据源流动和修改到另一个。像 Apache Camel 这样的协调者和中介利用管道架构将信息从业务流程中的一个步骤传递到另一个步骤。
To illustrate how the pipeline architecture can be used, consider the following example, as illustrated in Figure 11-2, where various service telemetry information is sent from services via streaming to Apache Kafka. 为了说明管道架构如何使用,考虑以下示例,如图 11-2 所示,其中各种服务遥测信息通过流式传输从服务发送到 Apache Kafka。
Figure 11-2. Pipeline architecture example 图 11-2. 管道架构示例
Notice in Figure 11-2 the use of the pipeline architecture style to process the different kinds of data streamed to Kafka. The Service Info Capture filter (producer filter) subscribes to the Kafka topic and receives service information. It then sends this captured data to a tester filter called Duration Filter to determine whether the data captured from Kafka is related to the duration (in milliseconds) of the service request. Notice the separation of concerns between the filters; the Service Metrics Capture filter is only concerned about how to connect to a Kafka topic and receive streaming data, whereas the Duration Filter is only concerned about qualifying the data and optionally routing it to the next pipe. If the data is related to the duration (in milliseconds) of the service request, then the Duration Filter passes the data on to the Duration Calculator transformer filter. Otherwise, it passes it on to the Uptime Filter tester filter to check if the data is related to uptime metrics. If it is not, then the pipeline ends-the data is of no interest to this particular processing flow. Otherwise, if it is uptime metrics, it then passes the data along to the Uptime Calculator to calculate the uptime metrics for the service. These transformers then pass the modified data to the Database Output consumer, which then persists the data in a MongoDB database. 请注意图 11-2 中使用管道架构风格处理流向 Kafka 的不同类型数据。服务信息捕获过滤器(生产者过滤器)订阅 Kafka 主题并接收服务信息。然后,它将捕获的数据发送到一个名为持续时间过滤器的测试过滤器,以确定从 Kafka 捕获的数据是否与服务请求的持续时间(以毫秒为单位)相关。请注意过滤器之间的关注点分离;服务指标捕获过滤器只关心如何连接到 Kafka 主题并接收流数据,而持续时间过滤器只关心对数据进行资格验证,并可选择将其路由到下一个管道。如果数据与服务请求的持续时间(以毫秒为单位)相关,则持续时间过滤器将数据传递给持续时间计算器转换过滤器。否则,它将数据传递给正常运行时间过滤器测试过滤器,以检查数据是否与正常运行时间指标相关。如果不是,则管道结束——该数据对这个特定的处理流程没有兴趣。 否则,如果是正常运行时间指标,它会将数据传递给正常运行时间计算器,以计算服务的正常运行时间指标。这些转换器然后将修改后的数据传递给数据库输出消费者,后者将数据持久化到 MongoDB 数据库中。
This example shows the extensibility properties of the pipeline architecture. For example, in Figure 11-2, a new tester filter could easily be added after the Uptime Filter to pass the data on to another newly gathered metric, such as the database connection wait time. 此示例展示了管道架构的可扩展性特性。例如,在图 11-2 中,可以在 Uptime Filter 之后轻松添加一个新的测试过滤器,以将数据传递给另一个新收集的指标,例如数据库连接等待时间。
Architecture Characteristics Ratings 架构特性评级
A one-star rating in the characteristics ratings table Figure 11-3 means the specific architecture characteristic isn’t well supported in the architecture, whereas a five-star rating means the architecture characteristic is one of the strongest features in the architecture style. The definition for each characteristic identified in the scorecard can be found in Chapter 4. 在特征评分表图 11-3 中,一星评级意味着特定的架构特征在架构中支持不佳,而五星评级则意味着该架构特征是架构风格中最强的特征之一。评分卡中识别的每个特征的定义可以在第 4 章找到。
The pipeline architecture style is a technically partitioned architecture due to the partitioning of application logic into filter types (producer, tester, transformer, and consumer). Also, because the pipeline architecture is usually implemented as a monolithic deployment, the architectural quantum is always one. 管道架构风格是一种技术上分区的架构,因为应用逻辑被划分为过滤器类型(生产者、测试者、转换器和消费者)。此外,由于管道架构通常作为单体部署实现,架构量始终为一。
Figure 11-3. Pipeline architecture characteristics ratings 图 11-3. 管道架构特性评分
Overall cost and simplicity combined with modularity are the primary strengths of the pipeline architecture style. Being monolithic in nature, pipeline architectures don’t have the complexities associated with distributed architecture styles, are simple and easy to understand, and are relatively low cost to build and maintain. Architectural modularity is achieved through the separation of concerns between the various filter types and transformers. Any of these filters can be modified or replaced without impacting the other filters. For instance, in the Kafka example illustrated in Figure 11-2, the Duration Calculator can be modified to change the duration calculation without impacting any other filter. 管道架构风格的主要优势是整体成本和简单性与模块化相结合。由于其单体特性,管道架构没有与分布式架构风格相关的复杂性,简单易懂,构建和维护的成本相对较低。通过不同过滤器类型和变换器之间的关注点分离,实现了架构的模块化。这些过滤器中的任何一个都可以被修改或替换,而不会影响其他过滤器。例如,在图 11-2 中所示的 Kafka 示例中,持续时间计算器可以被修改以更改持续时间计算,而不会影响任何其他过滤器。
Deployability and testability, while only around average, rate slightly higher than the layered architecture due to the level of modularity achieved through filters. That said, this architecture style is still a monolith, and as such, ceremony, risk, frequency of deployment, and completion of testing still impact the pipeline architecture. 可部署性和可测试性虽然仅处于平均水平,但由于通过过滤器实现的模块化程度略高于分层架构。因此,这种架构风格仍然是一个整体,因此,仪式、风险、部署频率和测试完成情况仍然会影响管道架构。
Like the layered architecture, overall reliability rates medium (three stars) in this architecture style, mostly due to the lack of network traffic, bandwidth, and latency found in most distributed architectures. We only gave it three stars for reliability because of the nature of the monolithic deployment of this architecture style in conjunction with testability and deployability issues (such as having to test the entire monolith and deploy the entire monolith for any given change). 与分层架构类似,这种架构风格的整体可靠性评级为中等(三颗星),主要是由于大多数分布式架构中缺乏网络流量、带宽和延迟。我们仅为其可靠性给予三颗星,原因在于这种架构风格的单体部署特性以及可测试性和可部署性问题(例如,必须测试整个单体并为任何给定的更改部署整个单体)。
Elasticity and scalability rate very low (one star) for the pipeline architecture, primarily due to monolithic deployments. Although it is possible to make certain functions within a monolith scale more than others, this effort usually requires very complex design techniques such as multithreading, internal messaging, and other parallel processing practices, techniques this architecture isn’t well suited for. However, because the pipeline architecture is always a single system quantum due to the monolithic user interface, backend processing, and monolithic database, applications can only scale to a certain point based on the single architecture quantum. 管道架构的弹性和可扩展性评分非常低(一个星),主要是由于单体部署。尽管在单体中可以使某些功能的扩展性超过其他功能,但这通常需要非常复杂的设计技术,如多线程、内部消息传递和其他并行处理实践,而这些技术并不适合这种架构。然而,由于管道架构始终是一个单一的系统量子,原因在于单体用户界面、后端处理和单体数据库,应用程序只能根据单一架构量子扩展到某个点。
Pipeline architectures don’t support fault tolerance due to monolithic deployments and the lack of architectural modularity. If one small part of a pipeline architecture causes an out-of-memory condition to occur, the entire application unit is impacted and crashes. Furthermore, overall availability is impacted due to the high mean time to recovery (MTTR) usually experienced by most monolithic applications, with startup times ranging anywhere from 2 minutes for smaller applications, up to 15 minutes or more for most large applications. 管道架构由于单体部署和缺乏架构模块化,不支持容错。如果管道架构的一个小部分导致出现内存不足的情况,则整个应用单元都会受到影响并崩溃。此外,由于大多数单体应用通常经历的高平均恢复时间(MTTR),整体可用性也受到影响,启动时间从较小应用的 2 分钟到大多数大型应用的 15 分钟或更长时间不等。
Microkernel Architecture Style 微内核架构风格
The microkernel architecture style (also referred to as the plug-in architecture) was coined several decades ago and is still widely used today. This architecture style is a natural fit for product-based applications (packaged and made available for download and installation as a single, monolithic deployment, typically installed on the customer’s site as a third-party product) but is widely used in many nonproduct custom business applications as well. 微内核架构风格(也称为插件架构)是在几十年前提出的,至今仍被广泛使用。这种架构风格非常适合基于产品的应用程序(作为单一的、整体的部署进行打包并提供下载和安装,通常安装在客户现场作为第三方产品),但在许多非产品的定制业务应用程序中也被广泛使用。
Topology 拓扑
The microkernel architecture style is a relatively simple monolithic architecture consisting of two architecture components: a core system and plug-in components. Application logic is divided between independent plug-in components and the basic core system, providing extensibility, adaptability, and isolation of application features and custom processing logic. Figure 12-1 illustrates the basic topology of the microkernel architecture style. 微内核架构风格是一种相对简单的单体架构,由两个架构组件组成:核心系统和插件组件。应用逻辑在独立的插件组件和基本核心系统之间进行划分,提供了可扩展性、适应性以及应用特性和自定义处理逻辑的隔离。图 12-1 展示了微内核架构风格的基本拓扑结构。
Figure 12-1. Basic components of the microkernel architecture style 图 12-1. 微内核架构风格的基本组件
Core System 核心系统
The core system is formally defined as the minimal functionality required to run the system. The Eclipse IDE is a good example of this. The core system of Eclipse is just a basic text editor: open a file, change some text, and save the file. It’s not until you add plug-ins that Eclipse starts becoming a usable product. However, another definition of the core system is the happy path (general processing flow) through the application, with little or no custom processing. Removing the cyclomatic complexity of the core system and placing it into separate plug-in components allows for better extensibility and maintainability, as well as increased testability. For example, suppose an electronic device recycling application must perform specific custom assessment rules for each electronic device received. The Java code for this sort of processing might look as follows: 核心系统被正式定义为运行系统所需的最小功能。Eclipse IDE 就是一个很好的例子。Eclipse 的核心系统只是一个基本的文本编辑器:打开一个文件,修改一些文本,然后保存文件。直到你添加插件,Eclipse 才开始变成一个可用的产品。然而,核心系统的另一个定义是应用程序中的快乐路径(一般处理流程),几乎没有或没有自定义处理。将核心系统的圈复杂度移除并放入单独的插件组件中,可以实现更好的可扩展性和可维护性,以及提高可测试性。例如,假设一个电子设备回收应用程序必须对每个收到的电子设备执行特定的自定义评估规则。这种处理的 Java 代码可能如下所示:
public void assessDevice(String deviceID) {
if (deviceID.equals("iPhone6s")) {
assessiPhone6s();
} else if (deviceID.equals("iPad1"))
assessiPad1();
} else if (deviceID.equals("Galaxy5"))
assessGalaxy5();
} else ...
}
}
Rather than placing all this client-specific customization in the core system with lots of cyclomatic complexity, it is much better to create a separate plug-in component for each electronic device being assessed. Not only do specific client plug-in components isolate independent device logic from the rest of the processing flow, but they also allow for expandability. Adding a new electronic device to assess is simply a matter of adding a new plug-in component and updating the registry. With the microkernel architecture style, assessing an electronic device only requires the core system to locate and invoke the corresponding device plug-ins as illustrated in this revised source code: 与其将所有这些客户特定的定制放在核心系统中,导致大量的圈复杂度,不如为每个被评估的电子设备创建一个单独的插件组件。特定客户的插件组件不仅将独立的设备逻辑与其余处理流程隔离开来,还允许扩展性。添加一个新的电子设备进行评估只需添加一个新的插件组件并更新注册表。使用微内核架构风格,评估电子设备只需核心系统定位并调用相应的设备插件,如以下修订的源代码所示:
In this example all of the complex rules and instructions for assessing a particular electronic device are self-contained in a standalone, independent plug-in component that can be generically executed from the core system. 在这个例子中,评估特定电子设备的所有复杂规则和指令都包含在一个独立的、独立的插件组件中,该组件可以从核心系统中通用地执行。
Depending on the size and complexity, the core system can be implemented as a layered architecture or a modular monolith (as illustrated in Figure 12-2). In some cases, the core system can be split into separately deployed domain services, with each domain service containing specific plug-in components specific to that domain. For example, suppose Payment Processing is the domain service representing the core system. Each payment method (credit card, PayPal, store credit, gift card, and purchase order) would be separate plug-in components specific to the payment domain. In all of these cases, it is typical for the entire monolithic application to share a single database. 根据规模和复杂性,核心系统可以实现为分层架构或模块化单体(如图 12-2 所示)。在某些情况下,核心系统可以拆分为单独部署的领域服务,每个领域服务包含特定于该领域的插件组件。例如,假设支付处理是代表核心系统的领域服务。每种支付方式(信用卡、PayPal、商店积分、礼品卡和采购订单)将是特定于支付领域的单独插件组件。在所有这些情况下,整个单体应用程序通常共享一个数据库。
Figure 12-2. Variations of the microkernel architecture core system 图 12-2. 微内核架构核心系统的变体
The presentation layer of the core system can be embedded within the core system or implemented as a separate user interface, with the core system providing backend services. As a matter of fact, a separate user interface can also be implemented as a microkernel architecture style. Figure 12-3 illustrates these presentation layer variants in relation to the core system. 核心系统的表现层可以嵌入在核心系统内,或作为一个独立的用户界面实现,核心系统提供后端服务。实际上,独立的用户界面也可以作为微内核架构风格实现。图 12-3 展示了这些表现层变体与核心系统的关系。
Separate User Interface (Multiple Deployment Units, Both Microkernel) 独立用户界面(多个部署单元,均为微内核)
Figure 12-3. User interface variants 图 12-3. 用户界面变体
Plug-In Components 插件组件
Plug-in components are standalone, independent components that contain specialized processing, additional features, and custom code meant to enhance or extend the core system. Additionally, they can be used to isolate highly volatile code, creating 插件组件是独立的、独立的组件,包含专门的处理、附加功能和旨在增强或扩展核心系统的自定义代码。此外,它们可以用于隔离高度易变的代码,创建
better maintainability and testability within the application. Ideally, plug-in components should be independent of each other and have no dependencies between them. 更好的可维护性和可测试性在应用程序中。理想情况下,插件组件应该彼此独立,并且之间没有依赖关系。
The communication between the plug-in components and the core system is generally point-to-point, meaning the “pipe” that connects the plug-in to the core system is usually a method invocation or function call to the entry-point class of the plug-in component. In addition, the plug-in component can be either compile-based or runtime-based. Runtime plug-in components can be added or removed at runtime without having to redeploy the core system or other plug-ins, and they are usually managed through frameworks such as Open Service Gateway Initiative (OSGi) for Java, Penrose (Java), Jigsaw (Java), or Prism (.NET). Compile-based plug-in components are much simpler to manage but require the entire monolithic application to be redeployed when modified, added, or removed. 插件组件与核心系统之间的通信通常是点对点的,这意味着连接插件与核心系统的“管道”通常是对插件组件的入口类的一个方法调用或函数调用。此外,插件组件可以是基于编译的或基于运行时的。运行时插件组件可以在运行时添加或移除,而无需重新部署核心系统或其他插件,通常通过如 Java 的开放服务网关倡议(OSGi)、Penrose(Java)、Jigsaw(Java)或 Prism(.NET)等框架进行管理。基于编译的插件组件管理起来要简单得多,但在修改、添加或移除时需要重新部署整个单体应用。
Point-to-point plug-in components can be implemented as shared libraries (such as a JAR, DLL, or Gem), package names in Java, or namespaces in C#. Continuing with the electronics recycling assessment application example, each electronic device plugin can be written and implemented as a JAR, DLL, or Ruby Gem (or any other shared library), with the name of the device matching the name of the independent shared library, as illustrated in Figure 12-4. 点对点插件组件可以实现为共享库(例如 JAR、DLL 或 Gem)、Java 中的包名或 C# 中的命名空间。继续以电子设备回收评估应用程序示例,每个电子设备插件可以作为 JAR、DLL 或 Ruby Gem(或任何其他共享库)编写和实现,设备的名称与独立共享库的名称相匹配,如图 12-4 所示。
Figure 12-4. Shared library plug-in implementation 图 12-4. 共享库插件实现
Alternatively, an easier approach shown in Figure 12-5 is to implement each plug-in component as a separate namespace or package name within the same code base or IDE project. When creating the namespace, we recommend the following semantics: app.plug-in.<domain>.<context>. For example, consider the namespace app.plugin.assessment.iphone6s. The second node (plugin) makes it clear this component is a plug-in and therefore should strictly adhere to the basic rules regarding plug-in components (namely, that they are self-contained and separate 另外,图 12-5 中显示了一种更简单的方法,即在同一代码库或 IDE 项目中将每个插件组件实现为一个单独的命名空间或包名称。在创建命名空间时,我们建议使用以下语义:app.plug-in..。例如,考虑命名空间 app.plugin.assessment.iphone6s。第二个节点(plugin)清楚地表明该组件是一个插件,因此应严格遵循有关插件组件的基本规则(即,它们是自包含的和独立的)。
from other plug-ins). The third node describes the domain (in this case, assessment), thereby allowing plug-in components to be organized and grouped by a common purpose. The fourth node (iphone6s) describes the specific context for the plug-in, making it easy to locate the specific device plug-in for modification or testing. 来自其他插件)。第三个节点描述了领域(在这种情况下是评估),从而允许插件组件按共同目的进行组织和分组。第四个节点(iphone6s)描述了插件的特定上下文,使得定位特定设备插件以进行修改或测试变得容易。
Figure 12-5. Package or namespace plug-in implementation 图 12-5. 包或命名空间插件实现
Plug-in components do not always have to be point-to-point communication with the core system. Other alternatives exist, including using REST or messaging as a means to invoke plug-in functionality, with each plug-in being a standalone service (or maybe even a microservice implemented using a container). Although this may sound like a good way to increase overall scalability, note that this topology (illustrated in Figure 12-6) is still only a single architecture quantum due to the monolithic core system. Every request must first go through the core system to get to the plug-in service. 插件组件不一定总是与核心系统进行点对点通信。还有其他替代方案,包括使用 REST 或消息传递作为调用插件功能的手段,每个插件都是一个独立的服务(或者甚至是使用容器实现的微服务)。尽管这听起来是提高整体可扩展性的好方法,但请注意,这种拓扑(如图 12-6 所示)仍然只是一个单一的架构量子,因为核心系统是单体的。每个请求必须首先通过核心系统才能到达插件服务。
Figure 12-6. Remote plug-in access using REST 图 12-6. 使用 REST 的远程插件访问
The benefits of the remote access approach to accessing plug-in components implemented as individual services is that it provides better overall component decoupling, 远程访问方法访问作为独立服务实现的插件组件的好处在于,它提供了更好的整体组件解耦
allows for better scalability and throughput, and allows for runtime changes without any special frameworks like OSGi, Jigsaw, or Prism. It also allows for asynchronous communications to plug-ins, which, depending on the scenario, could significantly improve overall user responsiveness. Using the electronics recycling example, rather than having to wait for the electronic device assessment to run, the core system could make an asynchronous request to kick off an assessment for a particular device. When the assessment completes, the plug-in can notify the core system through another asynchronous messaging channel, which in turn would notify the user that the assessment is complete. 允许更好的可扩展性和吞吐量,并允许在没有任何特殊框架(如 OSGi、Jigsaw 或 Prism)的情况下进行运行时更改。它还允许对插件进行异步通信,这在不同场景下可以显著提高整体用户响应能力。以电子产品回收为例,核心系统可以发起对特定设备的评估异步请求,而不必等待电子设备评估的运行。当评估完成时,插件可以通过另一个异步消息通道通知核心系统,核心系统则会通知用户评估已完成。
With these benefits comes trade-offs. Remote plug-in access turns the microkernel architecture into a distributed architecture rather than a monolithic one, making it difficult to implement and deploy for most third-party on-prem products. Furthermore, it creates more overall complexity and cost and complicates the overall deployment topology. If a plug-in becomes unresponsive or is not running, particularly when using REST, the request cannot be completed. This would not be the case with a monolithic deployment. The choice of whether to make the communication to plugin components from the core system point-to-point or remote should be based on specific requirements and thus requires a careful trade-off analysis of the benefits and drawbacks of such an approach. 这些好处伴随着权衡。远程插件访问将微内核架构转变为分布式架构,而不是单体架构,这使得大多数第三方本地产品的实施和部署变得困难。此外,它增加了整体复杂性和成本,并使整体部署拓扑变得复杂。如果插件变得无响应或未运行,特别是在使用 REST 时,请求将无法完成。这在单体部署中则不会出现。是否选择从核心系统到插件组件的通信是点对点还是远程,应基于具体要求,因此需要对这种方法的利弊进行仔细的权衡分析。
It is not a common practice for plug-in components to connect directly to a centrally shared database. Rather, the core system takes on this responsibility, passing whatever data is needed into each plug-in. The primary reason for this practice is decoupling. Making a database change should only impact the core system, not the plug-in components. That said, plug-ins can have their own separate data stores only accessible to that plug-in. For example, each electronic device assessment plug-in in the electronic recycling system example can have its own simple database or rules engine containing all of the specific assessment rules for each product. The data store owned by the plug-in component can be external (as shown in Figure 12-7), or it could be embedded as part of the plug-in component or monolithic deployment (as in the case of an in-memory or embedded database). 插件组件直接连接到中央共享数据库并不是一种常见做法。相反,核心系统承担这一责任,将所需的数据传递给每个插件。这样做的主要原因是解耦。对数据库的更改应该只影响核心系统,而不是插件组件。也就是说,插件可以拥有仅对该插件可访问的独立数据存储。例如,电子回收系统示例中的每个电子设备评估插件可以拥有自己的简单数据库或规则引擎,包含每个产品的所有特定评估规则。插件组件拥有的数据存储可以是外部的(如图 12-7 所示),也可以作为插件组件或单体部署的一部分嵌入(如内存数据库或嵌入式数据库的情况)。
Figure 12-7. Plug-in components can own their own data store 图 12-7. 插件组件可以拥有自己的数据存储
Registry 注册表
The core system needs to know about which plug-in modules are available and how to get to them. One common way of implementing this is through a plug-in registry. This registry contains information about each plug-in module, including things like its name, data contract, and remote access protocol details (depending on how the plug-in is connected to the core system). For example, a plug-in for tax software that flags high-risk tax audit items might have a registry entry that contains the name of the service (AuditChecker), the data contract (input data and output data), and the contract format (XML). 核心系统需要了解可用的插件模块以及如何访问它们。一种常见的实现方式是通过插件注册表。该注册表包含有关每个插件模块的信息,包括其名称、数据契约和远程访问协议细节(具体取决于插件如何连接到核心系统)。例如,一个用于税务软件的插件,用于标记高风险税务审计项目,可能在注册表中有一个条目,包含服务名称(AuditChecker)、数据契约(输入数据和输出数据)以及契约格式(XML)。
The registry can be as simple as an internal map structure owned by the core system containing a key and the plug-in component reference, or it can be as complex as a registry and discovery tool either embedded within the core system or deployed externally (such as Apache ZooKeeper or Consul). Using the electronics recycling example, the following Java code implements a simple registry within the core system, showing a point-to-point entry, a messaging entry, and a RESTful entry example for assessing an iPhone 6S device: 注册表可以简单到由核心系统拥有的内部映射结构,包含一个键和插件组件引用,或者可以复杂到一个注册和发现工具,嵌入在核心系统内或外部部署(如 Apache ZooKeeper 或 Consul)。使用电子回收的例子,以下 Java 代码在核心系统内实现了一个简单的注册表,展示了一个点对点条目、一个消息条目和一个用于评估 iPhone 6S 设备的 RESTful 条目示例:
Map<String, String> registry = new HashMap<String, String>();
static {
//point-to-point access example
registry.put("iPhone6s", "Iphone6sPlugin");
//messaging example
registry.put("iPhone6s", "iphone6s.queue");
//restful example
registry.put("iPhone6s", "https://atlas:443/assess/iphone6s");
}
Contracts 合同
The contracts between the plug-in components and the core system are usually standard across a domain of plug-in components and include behavior, input data, and output data returned from the plug-in component. Custom contracts are typically found in situations where plug-in components are developed by a third party where you have no control over the contract used by the plug-in. In such cases, it is common to create an adapter between the plug-in contract and your standard contract so that the core system doesn’t need specialized code for each plug-in. 插件组件与核心系统之间的合同通常在一组插件组件的领域内是标准的,包括行为、输入数据和从插件组件返回的输出数据。自定义合同通常出现在插件组件由第三方开发的情况下,在这种情况下,您无法控制插件使用的合同。在这种情况下,通常会在插件合同和您的标准合同之间创建一个适配器,以便核心系统不需要为每个插件编写专门的代码。
Plug-in contracts can be implemented in XML, JSON, or even objects passed back and forth between the plug-in and the core system. In keeping with the electronics recycling application, the following contract (implemented as a standard Java interface named AssessmentPlugin) defines the overall behavior (assess(), register(), and deregister()), along with the corresponding output data expected from the plug-in component (AssessmentOutput): 插件合同可以用 XML、JSON,甚至是插件与核心系统之间传递的对象来实现。与电子回收应用程序保持一致,以下合同(作为名为 AssessmentPlugin 的标准 Java 接口实现)定义了整体行为(assess()、register() 和 deregister()),以及插件组件(AssessmentOutput)预期的相应输出数据:
public interface AssessmentPlugin {
public AssessmentOutput assess();
public String register();
public String deregister();
}
public class AssessmentOutput {
public String assessmentReport;
public Boolean resell;
public Double value;
public Double resellPrice;
}
In this contract example, the device assessment plug-in is expected to return the assessment report as a formatted string; a resell flag (true or false) indicating whether this device can be resold on a third-party market or safely disposed of; and finally, if it can be resold (another form of recycling), what the calculated value is of the item and what the recommended resell price should be. 在这个合同示例中,设备评估插件预计将返回格式化字符串的评估报告;一个转售标志(真或假),指示该设备是否可以在第三方市场上转售或安全处置;最后,如果可以转售(另一种回收形式),该物品的计算价值是多少,以及推荐的转售价格应该是多少。
Notice the roles and responsibility model between the core system and the plug-in component in this example, specifically with the assessmentReport field. It is not the responsibility of the core system to format and understand the details of the assessment report, only to either print it out or display it to the user. 注意在这个例子中核心系统和插件组件之间的角色和责任模型,特别是 assessmentReport 字段。核心系统并不负责格式化和理解评估报告的细节,只需将其打印出来或显示给用户。
Examples and Use Cases 示例和用例
Most of the tools used for developing and releasing software are implemented using the microkernel architecture. Some examples include the Eclipse IDE, PMD, Jira, and Jenkins, to name a few). Internet web browsers such as Chrome and Firefox are another common product example using the microkernel architecture: viewers and 用于开发和发布软件的大多数工具都是使用微内核架构实现的。一些例子包括 Eclipse IDE、PMD、Jira 和 Jenkins 等。像 Chrome 和 Firefox 这样的互联网网页浏览器是另一个使用微内核架构的常见产品示例:查看器和
other plug-ins add additional capabilities that are not otherwise found in the basic browser representing the core system. The examples are endless for product-based software, but what about large business applications? The microkernel architecture applies to these situations as well. To illustrate this point, consider an insurance company example involving insurance claims processing. 其他插件增加了基本浏览器所没有的额外功能,基本浏览器代表了核心系统。对于基于产品的软件,例子不胜枚举,但大型企业应用程序呢?微内核架构同样适用于这些情况。为了说明这一点,考虑一个涉及保险索赔处理的保险公司示例。
Claims processing is a very complicated process. Each jurisdiction has different rules and regulations for what is and isn’t allowed in an insurance claim. For example, some jurisdictions (e.g., states) allow free windshield replacement if your windshield is damaged by a rock, whereas other states do not. This creates an almost infinite set of conditions for a standard claims process. 索赔处理是一个非常复杂的过程。每个管辖区对保险索赔的允许和不允许的规则和规定都不同。例如,一些管辖区(例如,各州)允许在挡风玻璃被石头损坏时免费更换挡风玻璃,而其他州则不允许。这为标准索赔流程创造了几乎无限的条件。
Most insurance claims applications leverage large and complex rules engines to handle much of this complexity. However, these rules engines can grow into a complex big ball of mud where changing one rule impacts other rules, or making a simple rule change requires an army of analysts, developers, and testers to make sure nothing is broken by a simple change. Using the microkernel architecture pattern can solve many of these issues. 大多数保险索赔应用程序利用大型复杂的规则引擎来处理这些复杂性。然而,这些规则引擎可能会发展成一个复杂的大泥球,其中更改一个规则会影响其他规则,或者进行简单的规则更改需要一大批分析师、开发人员和测试人员来确保简单的更改不会导致任何问题。使用微内核架构模式可以解决许多这些问题。
The claims rules for each jurisdiction can be contained in separate standalone plug-in components (implemented as source code or a specific rules engine instance accessed by the plug-in component). This way, rules can be added, removed, or changed for a particular jurisdiction without impacting any other part of the system. Furthermore, new jurisdictions can be added and removed without impacting other parts of the system. The core system in this example would be the standard process for filing and processing a claim, something that doesn’t change often. 每个管辖区的索赔规则可以包含在单独的独立插件组件中(作为源代码或由插件组件访问的特定规则引擎实例实现)。这样,可以为特定管辖区添加、删除或更改规则,而不会影响系统的其他部分。此外,可以添加和删除新的管辖区,而不会影响系统的其他部分。这个例子中的核心系统将是提交和处理索赔的标准流程,这个流程并不经常改变。
Another example of a large and complex business application that can leverage the microkernel architecture is tax preparation software. For example, the United States has a basic two-page tax form called the 1040 form that contains a summary of all the information needed to calculate a person’s tax liability. Each line in the 1040 tax form has a single number that requires many other forms and worksheets to arrive at that single number (such as gross income). Each of these additional forms and worksheets can be implemented as a plug-in component, with the 1040 summary tax form being the core system (the driver). This way, changes to tax law can be isolated to an independent plug-in component, making changes easier and less risky. 另一个可以利用微内核架构的大型复杂商业应用的例子是税务准备软件。例如,美国有一个基本的两页税表,称为 1040 表,包含计算个人税务责任所需的所有信息的摘要。1040 税表中的每一行都有一个单一的数字,这个数字需要许多其他表格和工作表来得出(例如,毛收入)。这些额外的表格和工作表可以作为插件组件实现,而 1040 摘要税表则是核心系统(驱动程序)。这样,税法的变化可以被隔离到一个独立的插件组件中,使得更改变得更容易且风险更小。
Architecture Characteristics Ratings 架构特性评级
A one-star rating in the characteristics ratings in Figure 12-812-8 means the specific architecture characteristic isn't well supported in the architecture,whereas a five-star rating means the architecture characteristic is one of the strongest features in the architecture style.The definition for each characteristic identified in the scorecard can be found in Chapter 4. 在图 12-812-8 中的特征评分中,一星评级意味着特定的架构特征在架构中支持不佳,而五星评级则意味着该架构特征是架构风格中最强的特征之一。评分卡中识别的每个特征的定义可以在第 4 章找到。
Figure 12-8.Microkernel architecture characteristics ratings 图 12-8.微内核架构特征评分
Similar to the layered architecture style,simplicity and overall cost are the main strengths of the microkernel architecture style,and scalability,fault tolerance,and extensibility its main weaknesses.These weaknesses are due to the typical monolithic deployments found with the microkernel architecture.Also,like the layered architec- ture style,the number of quanta is always singular(one)because all requests must go through the core system to get to independent plug-in components.That's where the similarities end. 与分层架构风格类似,微内核架构风格的主要优点是简单性和整体成本,而可扩展性、容错性和可扩展性是其主要缺点。这些缺点源于微内核架构中典型的单体部署。此外,与分层架构风格一样,量的数量始终是单一的(一个),因为所有请求必须通过核心系统才能到达独立的插件组件。相似之处到此为止
The microkernel architecture style is unique in that it is the only architecture style that can be both domain partitioned and technically partitioned. While most microkernel architectures are technically partitioned, the domain partitioning aspect comes about mostly through a strong domain-to-architecture isomorphism. For example, problems that require different configurations for each location or client match extremely well with this architecture style. Another example is a product or application that places a strong emphasis on user customization and feature extensibility (such as Jira or an IDE like Eclipse). 微内核架构风格是独特的,因为它是唯一一种可以同时进行领域划分和技术划分的架构风格。虽然大多数微内核架构在技术上是划分的,但领域划分的方面主要是通过强领域与架构同构关系来实现的。例如,需要为每个位置或客户端配置不同的配置的问题与这种架构风格非常匹配。另一个例子是一个产品或应用程序,它非常强调用户自定义和功能可扩展性(例如 Jira 或像 Eclipse 这样的 IDE)。
Testability, deployability, and reliability rate a little above average (three stars), primarily because functionality can be isolated to independent plug-in components. If done right, this reduces the overall testing scope of changes and also reduces overall risk of deployment, particularly if plug-in components are deployed in a runtime fashion. 可测试性、可部署性和可靠性略高于平均水平(三颗星),主要是因为功能可以被隔离到独立的插件组件。如果做得正确,这将减少变更的整体测试范围,并降低整体部署风险,特别是当插件组件以运行时方式部署时。
Modularity and extensibility also rate a little above average (three stars). With the microkernel architecture style, additional functionality can be added, removed, and changed through independent, self-contained plug-in components, thereby making it relatively easy to extend and enhance applications created using this architecture style and allowing teams to respond to changes much faster. Consider the tax preparation software example from the previous section. If the US tax law changes (which it does all the time), requiring a new tax form, that new tax form can be created as a plug-in component and added to the application without much effort. Similarly, if a tax form or worksheet is no longer needed, that plug-in can simply be removed from the application. 模块化和可扩展性也略高于平均水平(三颗星)。使用微内核架构风格,可以通过独立的、自包含的插件组件添加、移除和更改额外功能,从而使得使用这种架构风格创建的应用程序相对容易扩展和增强,并允许团队更快地响应变化。考虑前一节中的税务准备软件示例。如果美国税法发生变化(这时常发生),需要新的税表,那么可以将该新税表作为插件组件创建并添加到应用程序中,而无需太多努力。同样,如果某个税表或工作表不再需要,可以简单地将该插件从应用程序中移除。
Performance is always an interesting characteristic to rate with the microkernel architecture style. We gave it three stars (a little above average) mostly because microkernel applications are generally small and don’t grow as big as most layered architectures. Also, they don’t suffer as much from the architecture sinkhole antipattern discussed in Chapter 10. Finally, microkernel architectures can be streamlined by unplugging unneeded functionality, therefore making the application run faster. A good example of this is Wildfly (previously the JBoss Application Server). By unplugging unnecessary functionality like clustering, caching, and messaging, the application server performs much faster than with these features in place. 性能始终是用微内核架构风格进行评估的一个有趣特征。我们给它打了三颗星(略高于平均水平),主要是因为微内核应用通常较小,不会像大多数分层架构那样变得庞大。此外,它们也不太受第 10 章讨论的架构陷阱反模式的影响。最后,微内核架构可以通过拔掉不需要的功能来简化,从而使应用程序运行得更快。一个很好的例子是 Wildfly(之前的 JBoss 应用服务器)。通过拔掉不必要的功能,如集群、缓存和消息传递,应用服务器的性能比启用这些功能时快得多。
CHAPTER 13 第 13 章
Service-Based Architecture Style 基于服务的架构风格
Service-based architecture is a hybrid of the microservices architecture style and is considered one of the most pragmatic architecture styles, mostly due to its architectural flexibility. Although service-based architecture is a distributed architecture, it doesn’t have the same level of complexity and cost as other distributed architectures, such as microservices or event-driven architecture, making it a very popular choice for many business-related applications. 基于服务的架构是微服务架构风格的混合体,被认为是最务实的架构风格之一,主要由于其架构灵活性。尽管基于服务的架构是一种分布式架构,但它的复杂性和成本并不如其他分布式架构(如微服务或事件驱动架构)那么高,这使得它成为许多与业务相关的应用程序的热门选择。
Topology 拓扑
The basic topology of service-based architecture follows a distributed macro layered structure consisting of a separately deployed user interface, separately deployed remote coarse-grained services, and a monolithic database. This basic topology is illustrated in Figure 13-1. 基于服务的架构的基本拓扑遵循分布式宏层结构,由单独部署的用户界面、单独部署的远程粗粒度服务和一个单体数据库组成。这个基本拓扑在图 13-1 中进行了说明。
Services within this architecture style are typically coarse-grained “portions of an application” (usually called domain services) that are independent and separately deployed. Services are typically deployed in the same manner as any monolithic application would be (such as an EAR file, WAR file, or assembly) and as such do not require containerization (although you could deploy a domain service in a container such as Docker). Because the services typically share a single monolithic database, the number of services within an application context generally range between 4 and 12 services, with the average being about 7 services. 在这种架构风格中,服务通常是粗粒度的“应用程序部分”(通常称为领域服务),它们是独立的并且可以单独部署。服务的部署方式通常与任何单体应用程序相同(例如 EAR 文件、WAR 文件或程序集),因此不需要容器化(尽管您可以将领域服务部署在像 Docker 这样的容器中)。由于服务通常共享一个单一的单体数据库,因此在一个应用程序上下文中,服务的数量一般在 4 到 12 个之间,平均约为 7 个服务。
Figure 13-1. Basic topology of the service-based architecture style 图 13-1. 基于服务的架构风格的基本拓扑
In most cases there is only a single instance of each domain service within a servicebased architecture. However, based on scalability, fault tolerance, and throughput needs, multiple instances of a domain service can certainly exist. Multiple instances of a service usually require some sort of load-balancing capability between the user interface and the domain service so that the user interface can be directed to a healthy and available service instance. 在大多数情况下,服务基础架构中每个领域服务只有一个实例。然而,根据可扩展性、容错性和吞吐量的需求,确实可以存在多个领域服务的实例。多个服务实例通常需要某种负载均衡能力,以便在用户界面和领域服务之间进行分配,从而使用户界面能够指向一个健康且可用的服务实例。
Services are accessed remotely from a user interface using a remote access protocol. While REST is typically used to access services from the user interface, messaging, remote procedure call (RPC), or even SOAP could be used as well. While an API layer consisting of a proxy or gateway can be used to access services from the user interface (or other external requests), in most cases the user interface accesses the services directly using a service locator pattern embedded within the user interface, API gateway, or proxy. 服务通过用户界面使用远程访问协议进行远程访问。虽然通常使用 REST 从用户界面访问服务,但也可以使用消息传递、远程过程调用(RPC)或甚至 SOAP。虽然可以使用由代理或网关组成的 API 层从用户界面(或其他外部请求)访问服务,但在大多数情况下,用户界面直接使用嵌入在用户界面、API 网关或代理中的服务定位器模式访问服务。
One important aspect of service-based architecture is that it typically uses a centrally shared database. This allows services to leverage SQL queries and joins in the same way a traditional monolithic layered architecture would. Because of the small number of services (4 to 12), database connections are not usually an issue in service-based architecture. Database changes, however, can be an issue. The section “Database Partitioning” on page 169 describes techniques for addressing and managing database change within a service-based architecture. 服务基础架构的一个重要方面是它通常使用一个集中共享的数据库。这使得服务能够像传统的单体分层架构一样利用 SQL 查询和连接。由于服务数量较少(4 到 12 个),在服务基础架构中,数据库连接通常不是问题。然而,数据库更改可能会成为一个问题。第 169 页的“数据库分区”部分描述了在服务基础架构中处理和管理数据库更改的技术。
Topology Variants 拓扑变体
Many topology variants exist within the service-based architecture style, making this perhaps one of the most flexible architecture styles. For example, the single monolithic user interface, as illustrated in Figure 13-1, can be broken apart into user interface domains, even to a level matching each domain service. These user interface variants are illustrated in Figure 13-2. 在基于服务的架构风格中存在许多拓扑变体,这使得它可能是最灵活的架构风格之一。例如,如图 13-1 所示的单一单体用户界面,可以拆分为用户界面域,甚至可以细分到与每个域服务相匹配的级别。这些用户界面变体在图 13-2 中进行了说明。
Figure 13-2. User interface variants 图 13-2. 用户界面变体
Similarly, opportunities may exist to break apart a single monolithic database into separate databases, even going as far as domain-scoped databases matching each domain service (similar to microservices). In these cases it is important to make sure the data in each separate database is not needed by another domain service. This avoids interservice communication between domain services (something to definitely avoid with service-based architecture) and also the duplication of data between databases. These database variants are illustrated in Figure 13-3. 同样,可能存在将单一的单体数据库拆分为多个独立数据库的机会,甚至可以做到与每个领域服务相匹配的领域范围数据库(类似于微服务)。在这些情况下,确保每个独立数据库中的数据不被其他领域服务所需是很重要的。这可以避免领域服务之间的服务间通信(这是基于服务的架构中绝对要避免的)以及数据库之间的数据重复。这些数据库变体在图 13-3 中进行了说明。
Figure 13-3. Database variants 图 13-3. 数据库变体
Finally, it is also possible to add an API layer consisting of a reverse proxy or gateway between the user interface and services, as shown in Figure 13-4. This is a good practice when exposing domain service functionality to external systems or when consolidating shared cross-cutting concerns and moving them outside of the user interface (such as metrics, security, auditing requirements, and service discovery). 最后,还可以在用户界面和服务之间添加一个由反向代理或网关组成的 API 层,如图 13-4 所示。当将领域服务功能暴露给外部系统或在整合共享的跨切关注点并将其移出用户界面时(例如指标、安全性、审计要求和服务发现),这是一种良好的实践。
Figure 13-4. Adding an API layer between the user interface and domain services 图 13-4. 在用户界面和领域服务之间添加 API 层
Service Design and Granularity 服务设计与粒度
Because domain services in a service-based architecture are generally coarse-grained, each domain service is typically designed using a layered architecture style consisting of an API facade layer, a business layer, and a persistence layer. Another popular design approach is to domain partition each domain service using sub-domains similar to the modular monolith architecture style. Each of these design approaches is illustrated in Figure 13-5. 由于基于服务的架构中的领域服务通常是粗粒度的,因此每个领域服务通常采用分层架构风格设计,包括 API 外观层、业务层和持久层。另一种流行的设计方法是使用类似于模块化单体架构风格的子领域对每个领域服务进行领域划分。这些设计方法在图 13-5 中进行了说明。
Figure 13-5. Domain service design variants 图 13-5. 领域服务设计变体
Regardless of the service design, a domain service must contain some sort of API access facade that the user interface interacts with to execute some sort of business functionality. The API access facade typically takes on the responsibility of orchestrating the business request from the user interface. For example, consider a business request from the user interface to place an order (also known as catalog checkout). This single request, received by the API access facade within the OrderService domain service, internally orchestrates the single business request: place the order, generate an order ID, apply the payment, and update the product inventory for each product ordered. In the microservices architecture style, this would likely involve the orchestration of many separately deployed remote single-purpose services to complete the request. This difference between internal class-level orchestration and external service orchestration points to one of the many significant differences between service-based architecture and microservices in terms of granularity. 无论服务设计如何,域服务必须包含某种 API 访问外观,用户界面通过它与之交互以执行某种业务功能。API 访问外观通常负责协调来自用户界面的业务请求。例如,考虑用户界面发出的一个业务请求以下订单(也称为目录结账)。这个请求由 OrderService 域服务中的 API 访问外观接收,内部协调单个业务请求:下订单、生成订单 ID、处理付款,并更新每个订购产品的库存。在微服务架构风格中,这可能涉及协调许多单独部署的远程单一目的服务以完成请求。内部类级别的协调与外部服务协调之间的这种差异指向了基于服务的架构与微服务在粒度方面的许多显著差异之一。
Because domain services are coarse-grained, regular ACID (atomicity, consistency, isolation, durability) database transactions involving database commits and rollbacks are used to ensure database integrity within a single domain service. Highly dis- 由于领域服务是粗粒度的,因此使用常规的 ACID(原子性、一致性、隔离性、持久性)数据库事务,包括数据库提交和回滚,以确保单个领域服务内的数据库完整性。高度分-
tributed architectures like microservices, on the other hand, usually have fine-grained services and use a distributed transaction technique known as BASE transactions (basic availability, soft state, eventual consistency) that rely on eventual consistency and hence do not support the same level of database integrity as ACID transactions in a service-based architecture. 另一方面,分布式架构如微服务通常具有细粒度的服务,并使用一种称为 BASE 事务(基本可用性、软状态、最终一致性)的分布式事务技术,这种技术依赖于最终一致性,因此不支持与基于服务的架构中的 ACID 事务相同级别的数据库完整性。
To illustrate this point, consider the example of a catalog checkout process within a service-based architecture. Suppose the customer places an order and the credit card used for payment has expired. Since this is an atomic transaction within the same service, everything added to the database can be removed using a rollback and a notice sent to the customer stating that the payment cannot be applied. Now consider this same process in a microservices architecture with smaller fine-grained services. First, the OrderPlacement service would accept the request, create the order, generate an order ID, and insert the order into the order tables. Once this is done, the order service would then make a remote call to the PaymentService, which would try to apply the payment. If the payment cannot be applied due to an expired credit card, then the order cannot be placed and the data is in an inconsistent state (the order information has already been inserted but has not been approved). In this case, what about the inventory for that order? Should it be marked as ordered and decremented? What if the inventory is low and another customer wishes to purchase the item? Should that new customer be allowed to buy it, or should the reserved inventory be reserved for the customer trying to place the order with an expired credit card? These are just a few of the questions that would need to be addressed when orchestrating a business process with multiple finer-grained services. 为了说明这一点,考虑在基于服务的架构中目录结账过程的例子。假设客户下了订单,而用于支付的信用卡已过期。由于这是同一服务中的原子事务,因此可以使用回滚将所有添加到数据库中的内容删除,并向客户发送通知,说明无法应用支付。现在考虑在微服务架构中使用更小的细粒度服务的相同过程。首先,OrderPlacement 服务将接受请求,创建订单,生成订单 ID,并将订单插入订单表中。一旦完成,订单服务将远程调用 PaymentService,尝试应用支付。如果由于信用卡过期而无法应用支付,则订单无法下达,数据处于不一致状态(订单信息已插入但尚未批准)。在这种情况下,该订单的库存怎么办?应该将其标记为已订购并减少库存吗?如果库存不足,而另一位客户希望购买该商品怎么办? 是否应该允许那个新客户购买它,还是应该将保留的库存保留给试图用过期信用卡下订单的客户?这些只是协调具有多个细粒度服务的业务流程时需要解决的一些问题。
Domain services, being coarse-grained, allow for better data integrity and consistency, but there is a trade-off. With service-based architecture, a change made to the order placement functionality in the OrderService would require testing the entire coarse-grained service (including payment processing), whereas with microservices the same change would only impact a small OrderPlacement service (requiring no change to the PaymentService). Furthermore, because more code is being deployed, there is more risk with service-based architecture that something might break (including payment processing), whereas with microservices each service has a single responsibility, hence less chance of breaking other functionality when being changed. 领域服务由于是粗粒度的,允许更好的数据完整性和一致性,但这有一个权衡。在基于服务的架构中,对 OrderService 中订单处理功能的更改将需要测试整个粗粒度服务(包括支付处理),而在微服务架构中,相同的更改只会影响一个小的 OrderPlacement 服务(不需要更改 PaymentService)。此外,由于部署了更多的代码,基于服务的架构存在更大的风险,可能会导致某些功能出现故障(包括支付处理),而在微服务架构中,每个服务都有单一的责任,因此在更改时破坏其他功能的可能性较小。
Database Partitioning 数据库分区
Although not required, services within a service-based architecture usually share a single, monolithic database due to the small number of services (4 to 12) within a given application context. This database coupling can present an issue with respect to database table schema changes. If not done properly, a table schema change can potentially impact every service, making database changes a very costly task in terms of effort and coordination. 尽管不是必需的,但在基于服务的架构中,服务通常共享一个单一的、单体的数据库,因为在特定应用上下文中服务的数量较少(4 到 12 个)。这种数据库耦合可能会在数据库表模式更改方面带来问题。如果处理不当,表模式更改可能会影响每个服务,从而使数据库更改在努力和协调方面变得非常昂贵。
Within a service-based architecture, the shared class files representing the database table schemas (usually referred to as entity objects) reside in a custom shared library used by all the domain services (such as a JAR file or DLL). Shared libraries might also contain SQL code. The practice of creating a single shared library of entity objects is the least effective way of implementing service-based architecture. Any change to the database table structures would also require a change to the single shared library containing all of the corresponding entity objects, thus requiring a change and redeployment to every service, regardless of whether or not the services actually access the changed table. Shared library versioning can help address this issue, but nevertheless, with a single shared library it is difficult to know which services are actually impacted by the table change without manual, detailed analysis. This single shared library scenario is illustrated in Figure 13-6. 在基于服务的架构中,表示数据库表模式的共享类文件(通常称为实体对象)位于所有域服务使用的自定义共享库中(例如 JAR 文件或 DLL)。共享库还可能包含 SQL 代码。创建一个包含实体对象的单一共享库的做法是实现基于服务的架构的最无效方式。对数据库表结构的任何更改都需要更改包含所有相应实体对象的单一共享库,因此需要对每个服务进行更改和重新部署,无论这些服务是否实际访问已更改的表。共享库版本控制可以帮助解决此问题,但无论如何,使用单一共享库很难知道哪些服务实际上受到表更改的影响,而无需手动进行详细分析。这个单一共享库的场景在图 13-6 中进行了说明。
Figure 13-6. Using a single shared library for database entity objects 图 13-6. 使用单个共享库用于数据库实体对象
One way to mitigate the impact and risk of database changes is to logically partition the database and manifest the logical partitioning through federated shared libraries. Notice in Figure 13-7 that the database is logically partitioned into five separate domains (common, customer, invoicing, order, and tracking). Also notice that there are five corresponding shared libraries used by the domain services matching the logical partitions in the database. Using this technique, changes to a table within a particular logical domain (in this case, invoicing) match the corresponding shared library containing the entity objects (and possibly SQL as well), impacting only those 减轻数据库更改的影响和风险的一种方法是对数据库进行逻辑分区,并通过联合共享库体现逻辑分区。请注意在图 13-7 中,数据库被逻辑地分为五个独立的领域(公共、客户、开票、订单和跟踪)。还要注意,有五个相应的共享库被领域服务使用,这些服务与数据库中的逻辑分区相匹配。使用这种技术,对特定逻辑领域(在这种情况下是开票)内的表的更改与包含实体对象的相应共享库(可能还有 SQL)相匹配,仅影响那些。
services using that shared library, which in this case is the invoicing service. No other services are impacted by this change. 使用该共享库的服务,在本例中是开票服务。没有其他服务受到此更改的影响。
Figure 13-7. Using multiple shared libraries for database entity objects 图 13-7. 使用多个共享库来处理数据库实体对象
Notice in Figure 13-7 the use of the common domain and the corresponding common_entities_lib shared library used by all services. This is a relatively common occurrence. These tables are common to all services, and as such, changes to these tables require coordination of all services accessing the shared database. One way to mitigate changes to these tables (and corresponding entity objects) is to lock the common entity objects in the version control system and restrict change access to only the database team. This helps control change and emphasizes the significance of changes to the common tables used by all services. 请注意图 13-7 中所有服务使用的公共域和相应的 common_entities_lib 共享库。这是一种相对常见的情况。这些表对所有服务都是公共的,因此,对这些表的更改需要协调所有访问共享数据库的服务。减轻对这些表(及相应实体对象)更改的一种方法是将公共实体对象锁定在版本控制系统中,并限制只有数据库团队可以进行更改。这有助于控制更改,并强调对所有服务使用的公共表更改的重要性。
Make the logical partitioning in the database as fine-grained as possible while still maintaining well-defined data domains to better control database changes within a service-based architecture. 在数据库中进行尽可能细粒度的逻辑分区,同时保持明确定义的数据域,以更好地控制基于服务的架构中的数据库更改。
Example Architecture 示例架构
To illustrate the flexibility and power of the service-based architecture style, consider the real-world example of an electronic recycling system used to recycle old electronic devices (such as an iPhone or Galaxy cell phone). The processing flow of recycling old electronic devices works as follows: first, the customer asks the company (via a website or kiosk) how much money they can get for the old electronic device (called quoting). If satisfied, the customer will send the electronic device to the recycling company, which in turn will receive the physical device (called receiving). Once received, the recycling company will then assess the device to determine if the device is in good working condition or not (called assessment). If the device is in good working condition, the company will send the customer the money promised for the device (called accounting). Through this process, the customer can go to the website at any time to check on the status of the item (called item status). Based on the assessment, the device is then recycled by either safely destroying it or reselling it (called recycling). Finally, the company periodically runs ad hoc and scheduled financial and operational reports based on recycling activity (called reporting). 为了说明基于服务的架构风格的灵活性和强大功能,考虑一个现实世界的例子:用于回收旧电子设备(如 iPhone 或 Galaxy 手机)的电子回收系统。回收旧电子设备的处理流程如下:首先,客户通过网站或自助服务机询问公司他们可以为旧电子设备获得多少钱(称为报价)。如果满意,客户将把电子设备发送给回收公司,回收公司将接收该物理设备(称为接收)。一旦接收,回收公司将评估该设备,以确定设备是否处于良好的工作状态(称为评估)。如果设备处于良好的工作状态,公司将向客户支付承诺的金额(称为会计)。通过这个过程,客户可以随时访问网站检查物品的状态(称为物品状态)。根据评估,设备将通过安全销毁或转售的方式进行回收(称为回收)。 最后,公司定期根据回收活动运行临时和定期的财务和运营报告(称为报告)。
Figure 13-8 illustrates this system using a service-based architecture. Notice how each domain area identified in the prior description is implemented as a separately deployed independent domain service. Scalability can be achieved by only scaling those services needing higher throughput (in this case, the customer-facing Quoting service and ItemStatus service). The other services do not need to scale, and as such only require a single service instance. 图 13-8 使用基于服务的架构展示了该系统。请注意,在之前的描述中识别的每个领域区域是如何作为单独部署的独立领域服务实现的。可通过仅扩展那些需要更高吞吐量的服务(在这种情况下,面向客户的报价服务和项目状态服务)来实现可扩展性。其他服务不需要扩展,因此只需要一个服务实例。
Also notice in how the user interface applications are federated into their respective domains: Customer Facing, Receiving, and Recycling and Accounting. This federation allows for fault tolerance of the user interface, scalability, and security (external customers have no network path to internal functionality). Finally, notice in this example that there are two separate physical databases: one for external customer-facing operations, and one for internal operations. This allows the internal data and operations to reside in a separate network zone from the external operations (denoted by the vertical line), providing much better security access restrictions and data protection. One-way access through the firewall allows internal services to access and update the customer-facing information, but not vice versa. Alternatively, depending on the database being used, internal table mirroring and table synchronization could also be used. 还要注意用户界面应用程序是如何被联合到各自的领域:面向客户、接收、回收和会计。这种联合允许用户界面的容错性、可扩展性和安全性(外部客户没有网络路径访问内部功能)。最后,在这个例子中要注意有两个独立的物理数据库:一个用于面向外部客户的操作,一个用于内部操作。这允许内部数据和操作位于与外部操作分开的网络区域(由垂直线表示),提供更好的安全访问限制和数据保护。通过防火墙的单向访问允许内部服务访问和更新面向客户的信息,但反之则不行。或者,根据所使用的数据库,也可以使用内部表镜像和表同步。
Figure 13-8. Electronics recycling example using service-based architecture 图 13-8. 使用基于服务的架构的电子废物回收示例
This example illustrates many of the benefits of the service-based architecture approach: scalability, fault tolerance, and security (data and functionality protection and access), in addition to agility, testability, and deployability. For example, the Assessment service is changed constantly to add assessment rules as new products are received. This frequent change is isolated to a single domain service, providing agility (the ability to respond quickly to change), as well as testability (the ease of and completeness of testing) and deployability (the ease, frequency, and risk of deployment). 这个例子展示了基于服务的架构方法的许多好处:可扩展性、容错性和安全性(数据和功能的保护与访问),此外还有敏捷性、可测试性和可部署性。例如,评估服务不断变化,以便在接收新产品时添加评估规则。这种频繁的变化被隔离在一个单一的领域服务中,提供了敏捷性(快速响应变化的能力)、可测试性(测试的简易性和完整性)以及可部署性(部署的简易性、频率和风险)。
Architecture Characteristics Ratings 架构特性评级
A one-star rating in the characteristics ratings table in Figure 13-9 means the specific architecture characteristic isn't well supported in the architecture,whereas a five-star rating means the architecture characteristic is one of the strongest features in the architecture style.The definition for each characteristic identified in the scorecard can be found in Chapter 4. 在图 13-9 的特征评分表中,一星评级意味着特定的架构特征在架构中支持不佳,而五星评级则意味着该架构特征是架构风格中最强的特征之一。评分卡中识别的每个特征的定义可以在第 4 章找到。
Architecture characteristic 架构特征
Star rating 星级评分
Partitioning type 分区类型
Domain 领域
Number of quanta 量子数
1 to many 1 对多
Deployability 可部署性
Elasticity 弹性
Evolutionary 演化的
そうそう 对了
Fault tolerance 容错
凩気気 凩气气
Modularity 模块化
Overall cost 总体成本
式気気気 式气气气
Performance 性能
気気 气气
Reliability 可靠性
雄気気 雄气气
Scalability 可扩展性
気気気 气气气
Simplicity 简单性
気気気 气气气
Testability 可测试性
式式边
Architecture characteristic Star rating
Partitioning type Domain
Number of quanta 1 to many
Deployability https://cdn.mathpix.com/cropped/2025_02_11_560afc03665d2bf0aa03g-194.jpg?height=74&width=274&top_left_y=760&top_left_x=926
Elasticity https://cdn.mathpix.com/cropped/2025_02_11_560afc03665d2bf0aa03g-194.jpg?height=74&width=144&top_left_y=837&top_left_x=926
Evolutionary そうそう
Fault tolerance 凩気気
Modularity https://cdn.mathpix.com/cropped/2025_02_11_560afc03665d2bf0aa03g-194.jpg?height=74&width=274&top_left_y=1063&top_left_x=926
Overall cost 式気気気
Performance 気気
Reliability 雄気気
Scalability 気気気
Simplicity 気気気
Testability 式式边| Architecture characteristic | Star rating |
| :---: | :---: |
| Partitioning type | Domain |
| Number of quanta | 1 to many |
| Deployability |  |
| Elasticity |  |
| Evolutionary | そうそう |
| Fault tolerance | 凩気気 |
| Modularity |  |
| Overall cost | 式気気気 |
| Performance | 気気 |
| Reliability | 雄気気 |
| Scalability | 気気気 |
| Simplicity | 気気気 |
| Testability | 式式边 |
Figure 13-9.Service-based architecture characteristics ratings 图 13-9.基于服务的架构特征评分
Service-based architecture is a domain-partitioned architecture,meaning that the structure is driven by the domain rather than a technical consideration(such as pre- sentation logic or persistence logic).Consider the prior example of the electronic recycling application.Each service,being a separately deployed unit of software,is scoped to a specific domain(such as item assessment).Changes made within this domain only impact the specific service,the corresponding user interface,and the 基于服务的架构是一种领域分区架构,这意味着结构是由领域驱动的,而不是由技术考虑(例如表示逻辑或持久性逻辑)驱动的。考虑之前的电子回收应用示例。每个服务作为一个单独部署的软件单元,范围限定在特定领域(例如物品评估)。在该领域内所做的更改仅影响特定服务、相应的用户界面,以及
corresponding database. Nothing else needs to be modified to support a specific assessment change. 相应的数据库。无需修改其他内容以支持特定的评估更改。
Being a distributed architecture, the number of quanta can be greater than or equal to one. Even though there may be anywhere from 4 to 12 separately deployed services, if those services all share the same database or user interface, that entire system would be only a single quantum. However, as illustrated in “Topology Variants” on page 165, both the user interface and database can be federated, resulting in multiple quanta within the overall system. In the electronics recycling example, the system contains two quanta, as illustrated in Figure 13-10: one for the customer-facing portion of the application containing a separate customer user interface, database, and set of services (Quoting and Item Status); and one for the internal operations of receiving, assessing, and recycling the electronic device. Notice that even though the internal operations quantum contains separately deployed services and two separate user interfaces, they all share the same database, making the internal operations portion of the application a single quantum. 作为一种分布式架构,量子数量可以大于或等于一个。即使可能有 4 到 12 个单独部署的服务,如果这些服务都共享相同的数据库或用户界面,那么整个系统将只有一个量子。然而,如第 165 页的“拓扑变体”所示,用户界面和数据库都可以是联合的,从而在整个系统中产生多个量子。在电子回收的例子中,该系统包含两个量子,如图 13-10 所示:一个是面向客户的应用部分,包含单独的客户用户界面、数据库和一组服务(报价和物品状态);另一个是接收、评估和回收电子设备的内部操作。请注意,即使内部操作量子包含单独部署的服务和两个单独的用户界面,它们都共享相同的数据库,使得应用程序的内部操作部分成为一个单一的量子。
Figure 13-10. Separate quanta in a service-based architecture 图 13-10. 服务基础架构中的独立量子
Although service-based architecture doesn’t contain any five-star ratings, it nevertheless rates high (four stars) in many important and vital areas. Breaking apart an application into separately deployed domain services using this architecture style allows for faster change (agility), better test coverage due to the limited scope of the domain (testability), and the ability for more frequent deployments carrying less risk than a large monolith (deployability). These three characteristics lead to better time-tomarket, allowing an organization to deliver new features and bug fixes at a relatively high rate. 尽管基于服务的架构没有任何五星评级,但在许多重要和关键领域,它的评分仍然很高(四星)。使用这种架构风格将应用程序拆分为单独部署的领域服务,可以实现更快的变更(敏捷性),由于领域的有限范围而提供更好的测试覆盖率(可测试性),以及更频繁的部署所带来的较小风险,而不是大型单体(可部署性)。这三个特性导致更好的市场响应时间,使组织能够以相对较高的速度交付新功能和修复错误。
Fault tolerance and overall application availability also rate high for service-based architecture. Even though domain services tend to be coarse-grained, the four-star rating comes from the fact that with this architecture style, services are usually selfcontained and do not leverage interservice communication due to database sharing and code sharing. As a result, if one domain service goes down (e.g., the Receiving service in the electronic recycling application example), it doesn’t impact any of the other six services. 服务导向架构在容错性和整体应用可用性方面也得分很高。尽管领域服务往往是粗粒度的,但四星评级的原因在于这种架构风格下,服务通常是自包含的,并且由于数据库共享和代码共享,不会利用服务间通信。因此,如果一个领域服务出现故障(例如,在电子回收应用示例中的接收服务),它不会影响其他六个服务。
Scalability only rates three stars due to the coarse-grained nature of the services, and correspondingly, elasticity only two stars. Although programmatic scalability and elasticity are certainly possible with this architecture style, more functionality is replicated than with finer-grained services (such as microservices) and as such is not as efficient in terms of machine resources and not as cost-effective. Typically there are only single service instances with service-based architecture unless there is a need for better throughput or failover. A good example of this is the electronics recycling application example-only the Quoting and Item Status services need to scale to support high customer volumes, but the other operational services only require single instances, making it easier to support such things as single in-memory caching and database connection pooling. 可扩展性仅获得三颗星,原因在于服务的粗粒度特性,相应地,弹性仅获得两颗星。尽管使用这种架构风格确实可以实现程序化的可扩展性和弹性,但与更细粒度的服务(如微服务)相比,复制的功能更多,因此在机器资源方面效率较低,成本效益也不高。通常,基于服务的架构中只有单个服务实例,除非需要更好的吞吐量或故障转移。一个很好的例子是电子回收应用程序示例——只有报价和项目状态服务需要扩展以支持高客户量,但其他操作服务只需要单个实例,这使得支持单一内存缓存和数据库连接池等功能变得更容易。
Simplicity and overall cost are two other drivers that differentiate this architecture style from other, more expensive and complex distributed architectures, such as microservices, event-driven architecture, or even space-based architecture. This makes service-based one of the easiest and cost-effective distributed architectures to implement. While this is an attractive proposition, there is a trade-off to this cost savings and simplicity in all of the characteristics containing four-star ratings. The higher the cost and complexity, the better these ratings become. 简单性和整体成本是将这种架构风格与其他更昂贵和复杂的分布式架构(如微服务、事件驱动架构或甚至基于空间的架构)区分开的两个驱动因素。这使得基于服务的架构成为最容易和最具成本效益的分布式架构之一。虽然这是一个有吸引力的提议,但在所有具有四星评级的特性中,这种成本节约和简单性之间存在权衡。成本和复杂性越高,这些评级就越好。
Service-based architectures tend to be more reliable than other distributed architectures due to the coarse-grained nature of the domain services. Larger services mean less network traffic to and between services, fewer distributed transactions, and less bandwidth used, therefore increasing overall reliability with respect to the network. 基于服务的架构往往比其他分布式架构更可靠,因为领域服务的粒度较粗。较大的服务意味着服务之间的网络流量更少,分布式事务更少,带宽使用更少,因此在网络方面提高了整体可靠性。
When to Use This Architecture Style 何时使用这种架构风格
The flexibility of this architecture style (see “Topology Variants” on page 165) combined with the number of three-star and four-star architecture characteristics ratings make service-based architecture one of the most pragmatic architecture styles available. While there are certainly other distributed architecture styles that are much more powerful, some companies find that power comes at too steep of a price, while others find that they quite simply don’t need that much power. It’s like having the power, speed, and agility of a Ferrari used only for driving back and forth to work in rush-hour traffic at 50 kilometers per hour-sure it looks cool, but what a waste of resources and money! 这种架构风格的灵活性(见第 165 页的“拓扑变体”)与三星和四星架构特性评级的数量相结合,使得基于服务的架构成为最务实的架构风格之一。虽然确实还有其他更强大的分布式架构风格,但一些公司发现这种强大付出的代价太高,而另一些公司则发现他们根本不需要那么强大的能力。这就像拥有法拉利的动力、速度和灵活性,却仅仅用于在高峰时段以每小时 50 公里的速度往返上班——当然它看起来很酷,但这真是资源和金钱的浪费!
Service-based architecture is also a natural fit when doing domain-driven design. Because services are coarse-grained and domain-scoped, each domain fits nicely into a separately deployed domain service. Each service in service-based architecture encompasses a particular domain (such as recycling in the electronic recycling application), therefore compartmentalizing that functionality into a single unit of software, making it easier to apply changes to that domain. 基于服务的架构在进行领域驱动设计时也是一种自然的选择。因为服务是粗粒度和领域范围的,每个领域都很好地适应于单独部署的领域服务。基于服务的架构中的每个服务都涵盖了一个特定的领域(例如电子回收应用中的回收),因此将该功能划分为一个单一的软件单元,使得对该领域的更改更容易应用。
Maintaining and coordinating database transactions is always an issue with distributed architectures in that they typically rely on eventual consistency rather than traditional ACID (atomicity, consistency, isolation, and durability) transactions. However, service-based architecture preserves ACID transactions better than any other distributed architecture due to the coarse-grained nature of the domain services. There are cases where the user interface or API gateway might orchestrate two or more domain services, and in these cases the transaction would need to rely on sagas and BASE transactions. However, in most cases the transaction is scoped to a particular domain service, allowing for the traditional commit and rollback transaction functionality found in most monolithic applications. 维护和协调数据库事务在分布式架构中始终是一个问题,因为它们通常依赖于最终一致性,而不是传统的 ACID(原子性、一致性、隔离性和持久性)事务。然而,基于服务的架构比其他任何分布式架构更好地保留了 ACID 事务,因为领域服务的粗粒度特性。在某些情况下,用户界面或 API 网关可能会协调两个或多个领域服务,在这些情况下,事务需要依赖于 sagas 和 BASE 事务。然而,在大多数情况下,事务的范围限定在特定的领域服务内,从而允许在大多数单体应用程序中找到传统的提交和回滚事务功能。
Lastly, service-based architecture is a good choice for achieving a good level of architectural modularity without having to get tangled up in the complexities and pitfalls of granularity. As services become more fine-grained, issues surrounding orchestration and choreography start to appear. Both orchestration and choreography are required when multiple services must be coordinated to complete a certain business transaction. Orchestration is the coordination of multiple services through the use of a separate mediator service that controls and manages the workflow of the transaction (like a conductor in an orchestra). Choreography, on the other hand, is the coordination of multiple services by which each service talks to one another without the use of a central mediator (like dancers in a dance). As services become more finegrained, both orchestration and choreography are necessary to tie the services together to complete the business transaction. However, because services within a service-based architecture tend to be more coarse-grained, they don’t require coordination nearly as much as other distributed architectures. 最后,基于服务的架构是实现良好架构模块化的一个不错选择,而无需陷入粒度的复杂性和陷阱。当服务变得更加细粒度时,围绕编排和舞蹈的问题开始出现。当多个服务必须协调以完成某个业务交易时,编排和舞蹈都是必需的。编排是通过使用一个单独的中介服务来协调多个服务,该中介服务控制和管理交易的工作流程(就像乐队中的指挥)。而舞蹈则是多个服务之间的协调,每个服务相互通信而不使用中央中介(就像舞者在跳舞)。随着服务变得更加细粒度,编排和舞蹈都是将服务结合在一起以完成业务交易所必需的。然而,由于基于服务的架构中的服务往往更为粗粒度,因此它们不需要像其他分布式架构那样频繁地进行协调。
CHAPTER 14 第 14 章
Event-Driven Architecture Style 事件驱动架构风格
The event-driven architecture style is a popular distributed asynchronous architecture style used to produce highly scalable and high-performance applications. It is also highly adaptable and can be used for small applications and as well as large, complex ones. Event-driven architecture is made up of decoupled event processing components that asynchronously receive and process events. It can be used as a standalone architecture style or embedded within other architecture styles (such as an eventdriven microservices architecture). 事件驱动架构风格是一种流行的分布式异步架构风格,用于生成高度可扩展和高性能的应用程序。它也具有高度的适应性,可以用于小型应用程序以及大型复杂应用程序。事件驱动架构由解耦的事件处理组件组成,这些组件异步接收和处理事件。它可以作为独立的架构风格使用,也可以嵌入其他架构风格中(例如事件驱动微服务架构)。
Most applications follow what is called a request-based model (illustrated in Figure 14-1). In this model, requests made to the system to perform some sort of action are send to a request orchestrator. The request orchestrator is typically a user interface, but it can also be implemented through an API layer or enterprise service bus. The role of the request orchestrator is to deterministically and synchronously direct the request to various request processors. The request processors handle the request, either retrieving or updating information in a database. 大多数应用程序遵循所谓的基于请求的模型(如图 14-1 所示)。在此模型中,向系统发出的请求以执行某种操作会发送到请求协调器。请求协调器通常是用户界面,但也可以通过 API 层或企业服务总线实现。请求协调器的角色是确定性和同步地将请求指向各种请求处理器。请求处理器处理请求,或者从数据库中检索信息,或者更新信息。
A good example of the request-based model is a request from a customer to retrieve their order history for the past six months. Retrieving order history information is a data-driven, deterministic request made to the system for data within a specific context, not an event happening that the system must react to. 请求驱动模型的一个好例子是客户请求检索他们过去六个月的订单历史。检索订单历史信息是一个数据驱动的、确定性的请求,向系统请求特定上下文中的数据,而不是系统必须响应的事件。
An event-based model, on the other hand, reacts to a particular situation and takes action based on that event. An example of an event-based model is submitting a bid for a particular item within an online auction. Submitting the bid is not a request made to the system, but rather an event that happens after the current asking price is announced. The system must respond to this event by comparing the bid to others received at the same time to determine who is the current highest bidder. 事件驱动模型则是根据特定情况做出反应,并根据该事件采取行动。事件驱动模型的一个例子是在在线拍卖中为特定物品提交出价。提交出价并不是向系统发出的请求,而是在当前要价公布后发生的事件。系统必须通过将出价与同时收到的其他出价进行比较来响应此事件,以确定当前的最高出价者。
Figure 14-1. Request-based model 图 14-1. 基于请求的模型
Topology 拓扑
There are two primary topologies within event-driven architecture: the mediator topology and the broker topology. The mediator topology is commonly used when you require control over the workflow of an event process, whereas the broker topology is used when you require a high degree of responsiveness and dynamic control over the processing of an event. Because the architecture characteristics and implementation strategies differ between these two topologies, it is important to understand each one to know which is best suited for a particular situation. 事件驱动架构中有两种主要拓扑:中介拓扑和代理拓扑。当您需要控制事件处理的工作流程时,通常使用中介拓扑,而当您需要对事件处理具有高度响应性和动态控制时,则使用代理拓扑。由于这两种拓扑之间的架构特性和实施策略不同,因此了解每种拓扑是很重要的,以便知道哪种最适合特定情况。
Broker Topology 代理拓扑
The broker topology differs from the mediator topology in that there is no central event mediator. Rather, the message flow is distributed across the event processor components in a chain-like broadcasting fashion through a lightweight message broker (such as RabbitMQ, ActiveMQ, HornetQ, and so on). This topology is useful when you have a relatively simple event processing flow and you do not need central event orchestration and coordination. 代理拓扑与中介拓扑的不同之处在于没有中央事件中介。相反,消息流通过轻量级消息代理(如 RabbitMQ、ActiveMQ、HornetQ 等)以链式广播的方式分布在事件处理组件之间。当您拥有相对简单的事件处理流程,并且不需要中央事件编排和协调时,这种拓扑是有用的。
There are four primary architecture components within the broker topology: an initiating event, the event broker, an event processor, and a processing event. The initiating event is the initial event that starts the entire event flow, whether it be a simple event like placing a bid in an online auction or more complex events in a health benefits system like changing a job or getting married. The initiating event is sent to an event channel in the event broker for processing. Since there is no mediator component in the broker topology managing and controlling the event, a single event processor accepts the initiating event from the event broker and begins the processing of that event. The event processor that accepted the initiating event performs a specific task associated with the processing of that event, then asynchronously advertises what it did to the rest of the system by creating what is called a processing event. This processing event is then asynchronously sent to the event broker for further processing, if needed. Other event processors listen to the processing event, react to that event by doing something, then advertise through a new processing event what they did. This process continues until no one is interested in what a final event processor did. Figure 14-2 illustrates this event processing flow. 在代理拓扑中有四个主要的架构组件:一个启动事件、事件代理、事件处理器和处理事件。启动事件是启动整个事件流的初始事件,无论是像在线拍卖中出价这样的简单事件,还是像更换工作或结婚这样的健康福利系统中的更复杂事件。启动事件被发送到事件代理中的事件通道进行处理。由于在代理拓扑中没有中介组件来管理和控制事件,单个事件处理器从事件代理接受启动事件并开始处理该事件。接受启动事件的事件处理器执行与该事件处理相关的特定任务,然后通过创建所谓的处理事件异步地向系统的其余部分通告它所做的事情。这个处理事件随后被异步发送到事件代理进行进一步处理(如果需要的话)。其他事件处理器监听处理事件,通过执行某些操作对该事件做出反应,然后通过新的处理事件通告它们所做的事情。 这个过程持续进行,直到没有人对最终事件处理器所做的事情感兴趣。图 14-2 说明了这个事件处理流程。
The event broker component is usually federated (meaning multiple domain-based clustered instances), where each federated broker contains all of the event channels used within the event flow for that particular domain. Because of the decoupled asynchronous fire-and-forget broadcasting nature of the broker topology, topics (or topic exchanges in the case of AMQP) are usually used in the broker topology using a publish-and-subscribe messaging model. 事件代理组件通常是联邦的(意味着多个基于域的集群实例),每个联邦代理包含该特定域内事件流中使用的所有事件通道。由于代理拓扑的解耦异步“发射即忘”广播特性,主题(或在 AMQP 的情况下为主题交换)通常在代理拓扑中使用发布-订阅消息模型。
Figure 14-2. Broker topology 图 14-2. Broker 拓扑
It is always a good practice within the broker topology for each event processor to advertise what it did to the rest of the system, regardless of whether or not any other event processor cares about what that action was. This practice provides architectural extensibility if additional functionality is required for the processing of that event. For example, suppose as part of a complex event process, as illustrated in Figure 14-3, an email is generated and sent to a customer notifying them of a particular action taken. The Notification event processor would generate and send the email, then advertise that action to the rest of the system through a new processing event sent to a topic. However, in this case, no other event processors are listening for events on that topic, and as such the message simply goes away. 在代理拓扑中,每个事件处理器向系统的其余部分通告其所做的事情始终是一个良好的实践,无论其他事件处理器是否关心该操作是什么。这种做法提供了架构的可扩展性,以便在处理该事件时需要额外的功能。例如,假设在一个复杂事件处理过程中,如图 14-3 所示,生成并发送一封电子邮件给客户,通知他们采取的特定行动。通知事件处理器将生成并发送电子邮件,然后通过发送到主题的新处理事件向系统的其余部分通告该操作。然而,在这种情况下,没有其他事件处理器在监听该主题上的事件,因此消息就这样消失了。
Figure 14-3. Notification event is sent but ignored 图 14-3. 通知事件被发送但被忽略
This is a good example of architectural extensibility. While it may seem like a waste of resources sending messages that are ignored, it is not. Suppose a new requirement comes along to analyze emails that have been sent to customers. This new event processor can be added to the overall system with minimal effort because the email information is available via the email topic to the new analyzer without having to add any additional infrastructure or apply any changes to other event processors. 这是架构可扩展性的一个好例子。虽然发送被忽略的消息似乎是一种资源浪费,但实际上并非如此。假设出现了一个新需求,需要分析已发送给客户的电子邮件。这个新的事件处理器可以以最小的努力添加到整体系统中,因为电子邮件信息可以通过电子邮件主题提供给新的分析器,而无需添加任何额外的基础设施或对其他事件处理器进行任何更改。
To illustrate how the broker topology works, consider the processing flow in a typical retail order entry system, as illustrated in Figure 14-4, where an order is placed for an item (say, a book like this one). In this example, the OrderPlacement event processor receives the initiating event (PlaceOrder), inserts the order in a database table, and returns an order ID to the customer. It then advertises to the rest of the system that it created an order through an order-created processing event. Notice that three event processors are interested in that event: the Notification event processor, the Payment event processor, and the Inventory event processor. All three of these event processors perform their tasks in parallel. 为了说明代理拓扑是如何工作的,考虑一个典型零售订单输入系统中的处理流程,如图 14-4 所示,在该系统中为某个商品(比如这本书)下订单。在这个例子中,OrderPlacement 事件处理器接收发起事件(PlaceOrder),将订单插入数据库表中,并返回订单 ID 给客户。然后,它通过一个订单创建处理事件向系统的其余部分通告它创建了一个订单。请注意,有三个事件处理器对该事件感兴趣:Notification 事件处理器、Payment 事件处理器和 Inventory 事件处理器。这三个事件处理器并行执行各自的任务。
Figure 14-4. Example of the broker topology 图 14-4. 代理拓扑示例
The Notification event processor receives the order-created processing event and emails the customer. It then generates another processing event (email-sent). Notice that no other event processors are listening to that event. This is normal and illustrates the previous example describing architectural extensibility-an in-place hook so that other event processors can eventually tap into that event feed, if needed. 通知事件处理器接收订单创建处理事件并向客户发送电子邮件。然后,它生成另一个处理事件(电子邮件已发送)。请注意,没有其他事件处理器在监听该事件。这是正常的,说明了之前描述的架构可扩展性的示例——一个就地挂钩,以便其他事件处理器在需要时可以最终接入该事件流。
The Inventory event processor also listens for the order-created processing event and decrements the corresponding inventory for that book. It then advertises this action through an inventory-updated processing event, which is in turn picked up by the Warehouse event processor to manage the corresponding inventory between warehouses, reordering items if supplies get too low. 库存事件处理器还监听订单创建处理事件,并减少该书籍的相应库存。然后,它通过库存更新处理事件宣传此操作,该事件又被仓库事件处理器接收,以管理仓库之间的相应库存,如果供应过低则重新订购物品。
The Payment event processor also receives the order-created processing event and charges the customer’s credit card for the order that was just created. Notice in Figure 14-4 that two events are generated as a result of the actions taken by the Payment event processor: one to notify the rest of the system that the payment was applied (payment-applied) and one processing event to notify the rest of the system that the payment was denied (payment-denied). Notice that the Notification event processor is interested in the payment-denied processing event, because it must, in turn, send an email to the customer informing them that they must update their credit card information or choose a different payment method. 支付事件处理器还接收订单创建处理事件,并对刚刚创建的订单向客户的信用卡收费。请注意在图 14-4 中,由支付事件处理器采取的操作生成了两个事件:一个是通知系统其余部分支付已应用(payment-applied),另一个是处理事件,通知系统其余部分支付被拒绝(payment-denied)。请注意,通知事件处理器对支付被拒绝处理事件感兴趣,因为它必须向客户发送电子邮件,告知他们必须更新其信用卡信息或选择其他支付方式。
The OrderFulfillment event processor listens to the payment-applied processing event and does order picking and packing. Once completed, it then advertises to the rest of the system that it fulfilled the order via an order-fulfilled processing event. Notice that both the Notification processing unit and the Shipping processing unit listen to this processing event. Concurrently, the Notification event processor notifies the customer that the order has been fulfilled and is ready for shipment, and at the same time the Shipping event processor selects a shipping method. The Shipping event processor ships the order and sends out an order-shipped processing event, which the Notification event processor also listens for to notify the customer of the order status change. OrderFulfillment 事件处理器监听 payment-applied 处理事件,并进行订单拣选和包装。一旦完成,它会通过 order-fulfilled 处理事件向系统的其他部分通告它已完成订单。请注意,Notification 处理单元和 Shipping 处理单元都在监听这个处理事件。同时,Notification 事件处理器通知客户订单已完成并准备发货,同时 Shipping 事件处理器选择运输方式。Shipping 事件处理器发货并发送 order-shipped 处理事件,Notification 事件处理器也在监听此事件,以通知客户订单状态的变化。
In analyzing the prior example, notice that all of the event processors are highly decoupled and independent of each other. The best way to understand the broker topology is to think about it as a relay race. In a relay race, runners hold a baton (a wooden stick) and run for a certain distance (say 1.5 kilometers), then hand off the baton to the next runner, and so on down the chain until the last runner crosses the finish line. In relay races, once a runner hands off the baton, that runner is done with the race and moves on to other things. This is also true with the broker topology. Once an event processor hands off the event, it is no longer involved with the processing of that specific event and is available to react to other initiating or processing events. In addition, each event processor can scale independently from one other to handle varying load conditions or backups in the processing within that event. The topics provide the back pressure point if an event processor comes down or slows down due to some environment issue. 在分析之前的例子时,请注意所有事件处理器都是高度解耦和相互独立的。理解代理拓扑的最佳方法是将其视为接力赛。在接力赛中,跑步者手持接力棒(一个木棍),跑一定的距离(比如 1.5 公里),然后将接力棒交给下一个跑步者,依此类推,直到最后一名跑步者越过终点线。在接力赛中,一旦跑步者交接了接力棒,该跑步者就完成了比赛,可以去做其他事情。这在代理拓扑中也是如此。一旦事件处理器交接了事件,它就不再参与该特定事件的处理,并且可以随时响应其他启动或处理事件。此外,每个事件处理器可以独立于其他处理器进行扩展,以应对不同的负载条件或该事件内处理的备份。如果某个事件处理器因环境问题而停机或减速,主题提供了反压点。
While performance, responsiveness, and scalability are all great benefits of the broker topology, there are also some negatives about it. First of all, there is no control over the overall workflow associated with the initiating event (in this case, the PlaceOrder event). It is very dynamic based on various conditions, and no one in the system really knows when the business transaction of placing an order is actually complete. Error handling is also a big challenge with the broker topology. Because there is no mediator monitoring or controlling the business transaction, if a failure occurs (such as the Payment event processor crashing and not completing its assigned task), no one in the system is aware of that crash. The business process gets stuck and is unable to move without some sort of automated or manual intervention. Furthermore, all other processes are moving along without regard for the error. For example, the Inventory event processor still decrements the inventory, and all other event processors react as though everything is fine. 虽然代理拓扑的性能、响应性和可扩展性都是很大的优势,但它也有一些缺点。首先,无法控制与发起事件(在这种情况下是 PlaceOrder 事件)相关的整体工作流程。它是基于各种条件非常动态的,系统中的任何人都不知道下订单的业务交易何时真正完成。错误处理在代理拓扑中也是一个大挑战。因为没有中介监控或控制业务交易,如果发生故障(例如,支付事件处理器崩溃并未完成其分配的任务),系统中的任何人都不会意识到该崩溃。业务流程被卡住,无法在没有某种自动或手动干预的情况下继续。此外,所有其他流程在没有考虑错误的情况下继续进行。例如,库存事件处理器仍然会减少库存,所有其他事件处理器的反应就像一切正常一样。
The ability to restart a business transaction (recoverability) is also something not supported with the broker topology. Because other actions have asynchronously been taken through the initial processing of the initiating event, it is not possible to resubmit the initiating event. No component in the broker topology is aware of the state or even owns the state of the original business request, and therefore no one is responsible in this topology for restarting the business transaction (the initiating event) and knowing where it left off. The advantages and disadvantages of the broker topology are summarized in Table 14-1. 在代理拓扑中,重启业务事务的能力(可恢复性)也是不支持的。由于在初始事件的处理过程中已经异步采取了其他操作,因此无法重新提交初始事件。代理拓扑中的任何组件都不知道原始业务请求的状态,甚至不拥有该状态,因此在这种拓扑中没有人负责重启业务事务(初始事件)并知道它停留在哪里。代理拓扑的优缺点总结在表 14-1 中。
Table 14-1. Trade-offs of the broker topology 表 14-1. 代理拓扑的权衡
Advantages 优势
Disadvantages 缺点
Highly decoupled event processors 高度解耦的事件处理器
Workflow control 工作流控制
High scalability 高可扩展性
Error handling 错误处理
High responsiveness 高响应性
Recoverability 可恢复性
High performance 高性能
Restart capabilities 重启功能
High fault tolerance 高容错性
Data inconsistency 数据不一致性
Advantages Disadvantages
Highly decoupled event processors Workflow control
High scalability Error handling
High responsiveness Recoverability
High performance Restart capabilities
High fault tolerance Data inconsistency| Advantages | Disadvantages |
| :--- | :--- |
| Highly decoupled event processors | Workflow control |
| High scalability | Error handling |
| High responsiveness | Recoverability |
| High performance | Restart capabilities |
| High fault tolerance | Data inconsistency |
Mediator Topology 中介拓扑
The mediator topology of event-driven architecture addresses some of the shortcomings of the broker topology described in the previous section. Central to this topology is an event mediator, which manages and controls the workflow for initiating events that require the coordination of multiple event processors. The architecture components that make up the mediator topology are an initiating event, an event queue, an event mediator, event channels, and event processors. 事件驱动架构的中介拓扑解决了上一节中描述的代理拓扑的一些缺点。该拓扑的核心是一个事件中介,它管理和控制需要多个事件处理器协调的事件启动工作流。构成中介拓扑的架构组件包括一个启动事件、一个事件队列、一个事件中介、事件通道和事件处理器。
Like in the broker topology, the initiating event is the event that starts the whole eventing process. Unlike the broker topology, the initiating event is sent to an initiating event queue, which is accepted by the event mediator. The event mediator only knows the steps involved in processing the event and therefore generates corresponding processing events that are sent to dedicated event channels (usually queues) in a point-to-point messaging fashion. Event processors then listen to dedicated event channels, process the event, and usually respond back to the mediator that they have completed their work. Unlike the broker topology, event processors within the mediator topology do not advertise what they did to the rest of the system. The mediator topology is illustrated in Figure 14-5. 在代理拓扑中,启动事件是启动整个事件处理过程的事件。与代理拓扑不同,启动事件被发送到启动事件队列,由事件中介接受。事件中介只知道处理事件所涉及的步骤,因此生成相应的处理事件,这些事件以点对点消息传递的方式发送到专用事件通道(通常是队列)。事件处理器然后监听专用事件通道,处理事件,并通常向中介响应他们已完成工作。与代理拓扑不同,中介拓扑中的事件处理器不会向系统的其余部分宣传他们所做的事情。中介拓扑在图 14-5 中进行了说明。
Figure 14-5. Mediator topology 图 14-5. 中介拓扑
In most implementations of the mediator topology, there are multiple mediators, usually associated with a particular domain or grouping of events. This reduces the single point of failure issue associated with this topology and also increases overall throughput and performance. For example, there might be a customer mediator that handles all customer-related events (such as new customer registration and profile update), and another mediator that handles order-related activities (such as adding an item to a shopping cart and checking out). 在大多数中介拓扑的实现中,通常有多个中介,通常与特定领域或事件组相关联。这减少了与该拓扑相关的单点故障问题,并且还提高了整体吞吐量和性能。例如,可能有一个客户中介处理所有与客户相关的事件(例如新客户注册和个人资料更新),还有另一个中介处理与订单相关的活动(例如将商品添加到购物车和结账)。
The event mediator can be implemented in a variety of ways, depending on the nature and complexity of the events it is processing. For example, for events requiring simple error handling and orchestration, a mediator such as Apache Camel, Mule ESB, or Spring Integration will usually suffice. Message flows and message routes within these types of mediators are typically custom written in programming code (such as Java or C#) to control the workflow of the event processing. 事件中介可以通过多种方式实现,具体取决于它所处理事件的性质和复杂性。例如,对于需要简单错误处理和协调的事件,像 Apache Camel、Mule ESB 或 Spring Integration 这样的中介通常就足够了。这些类型的中介中的消息流和消息路由通常是用编程代码(如 Java 或 C#)自定义编写的,以控制事件处理的工作流程。
However, if the event workflow requires lots of conditional processing and multiple dynamic paths with complex error handling directives, then a mediator such as 然而,如果事件工作流需要大量的条件处理和多个动态路径以及复杂的错误处理指令,那么一个中介,例如
Apache ODE or the Oracle BPEL Process Manager would be a good choice. These mediators are based on Business Process Execution Language (BPEL), an XML-like structure that describes the steps involved in processing an event. BPEL artifacts also contain structured elements used for error handling, redirection, multicasting, and so on. BPEL is a powerful but relatively complex language to learn, and as such is usually created using graphical interface tools provided in the product’s BPEL engine suite. Apache ODE 或 Oracle BPEL Process Manager 是一个不错的选择。这些中介基于业务流程执行语言(BPEL),这是一种类似 XML 的结构,用于描述处理事件的步骤。BPEL 工件还包含用于错误处理、重定向、多播等的结构化元素。BPEL 是一种强大但相对复杂的语言,因此通常使用产品的 BPEL 引擎套件中提供的图形界面工具来创建。
BPEL is good for complex and dynamic workflows, but it does not work well for those event workflows requiring long-running transactions involving human intervention throughout the event process. For example, suppose a trade is being placed through a place-trade initiating event. The event mediator accepts this event, but during the processing finds that a manual approval is required because the trade is over a certain amount of shares. In this case the event mediator would have to stop the event processing, send a notification to a senior trader for the manual approval, and wait for that approval to occur. In these cases a Business Process Management (BPM) engine such as jBPM would be required. BPEL 适用于复杂和动态的工作流,但对于那些需要在整个事件过程中涉及人工干预的长时间运行的事务事件工作流,它的效果不佳。例如,假设通过一个下单交易的启动事件进行交易。事件中介接受了这个事件,但在处理过程中发现需要手动批准,因为交易超过了一定数量的股票。在这种情况下,事件中介必须停止事件处理,向高级交易员发送手动批准的通知,并等待该批准的发生。在这些情况下,需要一个业务流程管理(BPM)引擎,例如 jBPM。
It is important to know the types of events that will be processed through the mediator in order to make the correct choice for the implementation of the event mediator. Choosing Apache Camel for complex and long-running events involving human interaction would be extremely difficult to write and maintain. By the same token, using a BPM engine for simple event flows would take months of wasted effort when the same thing could be accomplished in Apache Camel in a matter of days. 了解将通过中介处理的事件类型对于正确选择事件中介的实现非常重要。选择 Apache Camel 来处理复杂且长时间运行的人机交互事件将非常难以编写和维护。同样,使用 BPM 引擎处理简单事件流将浪费数月的努力,而同样的事情在 Apache Camel 中可以在几天内完成。
Given that it’s rare to have all events of one class of complexity, we recommend classifying events as simple, hard, or complex and having every event always go through a simple mediator (such as Apache Camel or Mule). The simple mediator can then interrogate the classification of the event, and based on that classification, handle the event itself or forward it to another, more complex, event mediator. In this manner, all types of events can be effectively processed by the type of mediator needed for that event. This mediator delegation model is illustrated in Figure 14-6. 鉴于同一类复杂度的所有事件都很少出现,我们建议将事件分类为简单、困难或复杂,并让每个事件始终通过一个简单的中介(例如 Apache Camel 或 Mule)。简单的中介可以询问事件的分类,并根据该分类处理事件本身或将其转发给另一个更复杂的事件中介。通过这种方式,所有类型的事件都可以由所需类型的中介有效处理。该中介委托模型在图 14-6 中进行了说明。
Figure 14-6. Delegating the event to the appropriate type of event mediator 图 14-6. 将事件委托给适当类型的事件中介
Notice in Figure 14-6 that the Simple Event Mediator generates and sends a processing event when the event workflow is simple and can be handled by the simple mediator. However, notice that when the initiating event coming into the Simple Event Mediator is classified as either hard or complex, it forwards the original initiating event to the corresponding mediators (BPEL or BMP). The Simple Event Mediator, having intercepted the original event, may still be responsible for knowing when that event is complete, or it simply delegates the entire workflow (including client notification) to the other mediators. 请注意,在图 14-6 中,当事件工作流简单并且可以由简单中介处理时,简单事件中介会生成并发送处理事件。然而,请注意,当进入简单事件中介的发起事件被分类为困难或复杂时,它会将原始发起事件转发给相应的中介(BPEL 或 BMP)。简单事件中介在拦截原始事件后,仍然可能负责知道该事件何时完成,或者它只是将整个工作流(包括客户端通知)委托给其他中介。
To illustrate how the mediator topology works, consider the same retail order entry system example described in the prior broker topology section, but this time using the mediator topology. In this example, the mediator knows the steps required to process this particular event. This event flow (internal to the mediator component) is illustrated in Figure 14-7. 为了说明中介拓扑是如何工作的,考虑在前面的代理拓扑部分中描述的相同零售订单输入系统示例,但这次使用中介拓扑。在这个示例中,中介知道处理这个特定事件所需的步骤。该事件流(中介组件内部)在图 14-7 中进行了说明。
Figure 14-7. Mediator steps for placing an order 图 14-7. 下订单的中介步骤
In keeping with the prior example, the same initiating event (PlaceOrder) is sent to the customer-event-queue for processing. The Customer mediator picks up this initiating event and begins generating processing events based on the flow in Figure 14-7. Notice that the multiple events shown in steps 2, 3, and 4 are all done concurrently and serially between steps. In other words, step 3 (fulfill order) must be completed and acknowledged before the customer can be notified that the order is ready to be shipped in step 4 (ship order). 与之前的示例一致,相同的启动事件(PlaceOrder)被发送到客户事件队列进行处理。客户中介获取此启动事件,并开始根据图 14-7 中的流程生成处理事件。请注意,步骤 2、3 和 4 中显示的多个事件都是并发和串行完成的。换句话说,步骤 3(履行订单)必须在客户被通知订单已准备好在步骤 4(发货订单)中发货之前完成并得到确认。
Once the initiating event has been received, the Customer mediator generates a create-order processing event and sends this message to the order-placementqueue (see Figure 14-8). The OrderPlacement event processor accepts this event and validates and creates the order, returning to the mediator an acknowledgement along with the order ID. At this point the mediator might send that order ID back to the customer, indicating that the order was placed, or it might have to continue until all the steps are complete (this would be based on specific business rules about order placement). 一旦接收到启动事件,客户中介生成一个创建订单处理事件,并将此消息发送到订单放置队列(见图 14-8)。订单放置事件处理器接受此事件并验证并创建订单,向中介返回确认以及订单 ID。此时,中介可能会将该订单 ID 发送回客户,表示订单已被下达,或者可能需要继续直到所有步骤完成(这将基于关于订单放置的特定业务规则)。
Figure 14-8. Step 1 of the mediator example 图 14-8. 中介者示例的步骤 1
Now that step 1 is complete, the mediator now moves to step 2 (see Figure 14-9) and generates three messages at the same time: email-customer, apply-payment, and adjust-inventory. These processing events are all sent to their respective queues. All three event processors receive these messages, perform their respective tasks, and notify the mediator that the processing has been completed. Notice that the mediator must wait until it receives acknowledgement from all three parallel processes before moving on to step 3. At this point, if an error occurs in one of the parallel event processors, the mediator can take corrective action to fix the problem (this is discussed later in this section in more detail). 现在第一步完成后,中介进入第二步(见图 14-9),同时生成三条消息:email-customer、apply-payment 和 adjust-inventory。这些处理事件都被发送到各自的队列。所有三个事件处理器接收这些消息,执行各自的任务,并通知中介处理已完成。请注意,中介必须等到收到所有三个并行过程的确认后才能继续进行第三步。在这一点上,如果其中一个并行事件处理器发生错误,中介可以采取纠正措施来解决问题(这一点将在本节后面详细讨论)。
Figure 14-9. Step 2 of the mediator example 图 14-9. 中介者示例的步骤 2
Once the mediator gets a successful acknowledgment from all of the event processors in step 2, it can move on to step 3 to fulfill the order (see Figure 14-10). Notice once again that both of these events (fulfill-order and order-stock) can occur simultaneously. The OrderFulfillment and Warehouse event processors accept these events, perform their work, and return an acknowledgement to the mediator. 一旦中介在步骤 2 中从所有事件处理器获得成功的确认,它就可以进入步骤 3 来完成订单(见图 14-10)。请注意,这两个事件(fulfill-order 和 order-stock)可以同时发生。OrderFulfillment 和 Warehouse 事件处理器接受这些事件,执行它们的工作,并向中介返回确认。
Figure 14-10. Step 3 of the mediator example 图 14-10. 中介者示例的第 3 步
Once these events are complete, the mediator then moves on to step 4 (see Figure 14-11) to ship the order. This step generates another email-customer processing event with specific information about what to do (in this case, notify the customer that the order is ready to be shipped), as well as a ship-order event. 一旦这些事件完成,调解者将进入第 4 步(见图 14-11)以发货。此步骤生成另一个电子邮件-客户处理事件,包含有关该做什么的具体信息(在这种情况下,通知客户订单已准备好发货),以及一个发货订单事件。
Figure 14-11. Step 4 of the mediator example 图 14-11. 中介者示例的第 4 步
Finally, the mediator moves to step 5 (see Figure 14-12) and generates another contextual email-customer event to notify the customer that the order has been shipped. At this point the workflow is done, and the mediator marks the initiating event flow complete and removes all state associated with the initiating event. 最后,调解者进入第 5 步(见图 14-12),生成另一个上下文电子邮件-客户事件,以通知客户订单已发货。此时,工作流完成,调解者将启动事件流标记为完成,并移除与启动事件相关的所有状态。
Figure 14-12. Step 5 of the mediator example 图 14-12. 中介者示例的第 5 步
The mediator component has knowledge and control over the workflow, something the broker topology does not have. Because the mediator controls the workflow, it can maintain event state and manage error handling, recoverability, and restart capabilities. For example, suppose in the prior example the payment was not applied due to the credit card being expired. In this case the mediator receives this error condition, and knowing the order cannot be fulfilled (step 3) until payment is applied, stops the workflow and records the state of the request in its own persistent datastore. Once payment is eventually applied, the workflow can be restarted from where it left off (in this case, the beginning of step 3). 中介组件对工作流有知识和控制,这是代理拓扑所不具备的。由于中介控制工作流,它可以维护事件状态并管理错误处理、可恢复性和重启能力。例如,假设在之前的例子中,由于信用卡过期,支付未被应用。在这种情况下,中介接收到这个错误条件,并知道在支付应用之前(步骤 3)订单无法完成,因此停止工作流并在其自己的持久数据存储中记录请求的状态。一旦支付最终被应用,工作流可以从中断的地方重新启动(在这种情况下,是步骤 3 的开始)。
Another inherent difference between the broker and mediator topology is how the processing events differ in terms of their meaning and how they are used. In the broker topology example in the previous section, the processing events were published as events that had occurred in the system (such as order-created, paymentapplied, and email-sent). The event processors took some action, and other event processors react to that action. However, in the mediator topology, processing occurrences such as place-order, send-email, and fulfill-order are commands (things that need to happen) as opposed to events (things that have already happened). Also, in the mediator topology, a command must be processed, whereas an event can be ignored in the broker topology. 代理和中介拓扑之间的另一个固有区别在于处理事件的含义及其使用方式。在上一节的代理拓扑示例中,处理事件被发布为系统中发生的事件(例如订单创建、支付应用和邮件发送)。事件处理器采取了一些行动,其他事件处理器对此行动作出反应。然而,在中介拓扑中,处理事件如下单、发送邮件和履行订单是命令(需要发生的事情),而不是事件(已经发生的事情)。此外,在中介拓扑中,命令必须被处理,而在代理拓扑中,事件可以被忽略。
While the mediator topology addresses the issues associated with the broker topology, there are some negatives associated with the mediator topology. First of all, it is very difficult to declaratively model the dynamic processing that occurs within a complex event flow. As a result, many workflows within the mediator only handle the general processing, and a hybrid model combining both the mediator and broker topologies is used to address the dynamic nature of complex event processing (such as out-of-stock conditions or other nontypical errors). Furthermore, although the event processors can easily scale in the same manner as the broker topology, the mediator must scale as well, something that occasionally produces a bottleneck in the overall event processing flow. Finally, event processors are not as highly decoupled in the mediator topology as with the broker topology, and performance is not as good due to the mediator controlling the processing of the event. These trade-offs are summarized in Table 14-2. 虽然中介拓扑解决了与代理拓扑相关的问题,但中介拓扑也存在一些负面影响。首先,很难以声明方式建模复杂事件流中发生的动态处理。因此,中介中的许多工作流仅处理一般处理,并使用结合中介和代理拓扑的混合模型来应对复杂事件处理的动态特性(例如缺货情况或其他非典型错误)。此外,尽管事件处理器可以像代理拓扑一样轻松扩展,但中介也必须扩展,这有时会在整体事件处理流程中产生瓶颈。最后,在中介拓扑中,事件处理器的解耦程度不如代理拓扑高,性能也不如代理拓扑好,因为中介控制事件的处理。这些权衡在表 14-2 中进行了总结。
Table 14-2. Trade-offs of the mediator topology 表 14-2. 中介拓扑的权衡
Advantages 优势
Disadvantages 缺点
Workflow control 工作流控制
More coupling of event processors 事件处理器的耦合度更高
Error handling 错误处理
Lower scalability 较低的可扩展性
Recoverability 可恢复性
Lower performance 较低的性能
Restart capabilities 重启功能
Lower fault tolerance 较低的容错能力
Better data consistency 更好的数据一致性
Modeling complex workflows 建模复杂工作流
Advantages Disadvantages
Workflow control More coupling of event processors
Error handling Lower scalability
Recoverability Lower performance
Restart capabilities Lower fault tolerance
Better data consistency Modeling complex workflows| Advantages | Disadvantages |
| :--- | :--- |
| Workflow control | More coupling of event processors |
| Error handling | Lower scalability |
| Recoverability | Lower performance |
| Restart capabilities | Lower fault tolerance |
| Better data consistency | Modeling complex workflows |
The choice between the broker and mediator topology essentially comes down to a trade-off between workflow control and error handling capability versus high performance and scalability. Although performance and scalability are still good within the mediator topology, they are not as high as with the broker topology. 在代理和中介拓扑之间的选择本质上是工作流控制和错误处理能力与高性能和可扩展性之间的权衡。尽管在中介拓扑中性能和可扩展性仍然良好,但它们不如代理拓扑高。
Asynchronous Capabilities 异步能力
The event-driven architecture style offers a unique characteristic over other architecture styles in that it relies solely on asynchronous communication for both fire-andforget processing (no response required) as well as request/reply processing (response required from the event consumer). Asynchronous communication can be a powerful technique for increasing the overall responsiveness of a system. 事件驱动架构风格相较于其他架构风格具有独特的特点,即它完全依赖于异步通信,既用于无响应的处理(不需要响应),也用于请求/回复处理(需要事件消费者的响应)。异步通信可以成为提高系统整体响应能力的强大技术。
Consider the example illustrated in Figure 14-13 where a user is posting a comment on a website for a particular product review. Assume the comment service in this example takes 3,000 milliseconds to post the comment because it goes through several parsing engines: a bad word checker to check for unacceptable words, a grammar checker to make sure that the sentence structures are not saying something abusive, and finally a context checker to make sure the comment is about a particular product and not just a political rant. Notice in Figure 14-13 that the top path utilizes a synchronous RESTful call to post the comment: 50 milliseconds in latency for the service to receive the post, 3,000 milliseconds to post the comment, and 50 milliseconds in network latency to respond back to the user that the comment was posted. This creates a response time for the user of 3,100 milliseconds to post a comment. Now look at the bottom path and notice that with the use of asynchronous messaging, the response time from the end user’s perspective for posting a comment on the website is only 25 milliseconds (as opposed to 3,100 milliseconds). It still takes 3,025 milliseconds to post the comment ( 25 milliseconds to receive the message and 3,000 milliseconds to post the comment), but from the end user’s perspective it’s already been done. 考虑图 14-13 中所示的示例,其中用户在网站上发布特定产品评论。假设此示例中的评论服务需要 3000 毫秒来发布评论,因为它经过几个解析引擎:一个不良词汇检查器检查不可接受的词汇,一个语法检查器确保句子结构没有表达出攻击性内容,最后一个上下文检查器确保评论是关于特定产品的,而不仅仅是政治抨击。请注意,在图 14-13 中,顶部路径使用同步 RESTful 调用来发布评论:服务接收发布请求的延迟为 50 毫秒,发布评论的时间为 3000 毫秒,网络延迟为 50 毫秒以回应用户评论已发布。这使得用户发布评论的响应时间为 3100 毫秒。现在看看底部路径,注意使用异步消息传递时,最终用户在网站上发布评论的响应时间仅为 25 毫秒(而不是 3100 毫秒)。 发布评论仍然需要 3,025 毫秒(接收消息需要 25 毫秒,发布评论需要 3,000 毫秒),但从最终用户的角度来看,这已经完成。
Figure 14-13. Synchronous versus asynchronous communication 图 14-13. 同步与异步通信
This is a good example of the difference between responsiveness and performance. When the user does not need any information back (other than an acknowledgement or a thank you message), why make the user wait? Responsiveness is all about notifying the user that the action has been accepted and will be processed momentarily, whereas performance is about making the end-to-end process faster. Notice that nothing was done to optimize the way the comment service processes the text-in both cases it is still taking 3,000 milliseconds. Addressing performance would have been optimizing the comment service to run all of the text and grammar parsing engines in parallel with the use of caching and other similar techniques. The bottom example in Figure 14-13 addresses the overall responsiveness of the system but not the performance of the system. 这是响应性和性能之间差异的一个很好的例子。当用户不需要任何信息反馈(除了确认或感谢消息)时,为什么要让用户等待呢?响应性完全是关于通知用户该操作已被接受并将很快处理,而性能则是关于使端到端的过程更快。请注意,没有对评论服务处理文本的方式进行优化——在这两种情况下,它仍然需要 3000 毫秒。解决性能问题将是优化评论服务,使所有文本和语法解析引擎并行运行,并使用缓存和其他类似技术。图 14-13 中的底部示例解决了系统的整体响应性,但没有解决系统的性能。
The difference in response time between the two examples in Figure 14-13 from 3,100 milliseconds to 25 milliseconds is staggering. There is one caveat. On the synchronous path shown on the top of the diagram, the end user is guaranteed that the comment has been posted. However, on the bottom path there is only the acknowledgement of the post, with a future promise that eventually the comment will get posted. From the end user’s perspective, the comment has been posted. But what happens if the user had typed a bad word in the comment? In this case the comment would be rejected, but there is no way to get back to the end user. Or is there? In this example, assuming the user is registered with the website (which to post a comment they would have to be), a message could be sent to the user indicating a problem with the comment and some suggestions on how to repair it. This is a simple example. What about a more complicated example where the purchase of some stock is taking place asynchronously (called a stock trade) and there is no way to get back to the user? 图 14-13 中两个示例之间的响应时间差异从 3100 毫秒到 25 毫秒是惊人的。有一个警告。在图表顶部显示的同步路径上,最终用户可以保证评论已被发布。然而,在底部路径上只有对发布的确认,并且未来承诺评论最终会被发布。从最终用户的角度来看,评论已经发布。但是,如果用户在评论中输入了不当词汇会发生什么呢?在这种情况下,评论将被拒绝,但没有办法通知最终用户。或者说有办法吗?在这个例子中,假设用户已在网站上注册(为了发布评论,他们必须注册),可以向用户发送一条消息,指示评论存在问题并提供一些修复建议。这是一个简单的例子。那么,如果发生更复杂的情况,比如某些股票的购买是异步进行的(称为股票交易),而且没有办法联系到用户呢?
The main issue with asynchronous communications is error handling. While responsiveness is significantly improved, it is difficult to address error conditions, adding to the complexity of the event-driven system. The next section addresses this issue with a pattern of reactive architecture called the workflow event pattern. 异步通信的主要问题是错误处理。虽然响应性显著提高,但处理错误情况变得困难,增加了事件驱动系统的复杂性。下一节将通过一种称为工作流事件模式的反应式架构模式来解决这个问题。
Error Handling 错误处理
The workflow event pattern of reactive architecture is one way of addressing the issues associated with error handling in an asynchronous workflow. This pattern is a reactive architecture pattern that addresses both resiliency and responsiveness. In other words, the system can be resilient in terms of error handling without an impact to responsiveness. 反应式架构的工作流事件模式是解决异步工作流中与错误处理相关问题的一种方式。该模式是一种反应式架构模式,既关注弹性又关注响应性。换句话说,系统可以在错误处理方面具有弹性,而不会影响响应性。
The workflow event pattern leverages delegation, containment, and repair through the use of a workflow delegate, as illustrated in Figure 14-14. The event producer asynchronously passes data through a message channel to the event consumer. If the event consumer experiences an error while processing the data, it immediately dele- 工作流事件模式通过使用工作流委托,利用委托、包含和修复,如图 14-14 所示。事件生产者通过消息通道异步地将数据传递给事件消费者。如果事件消费者在处理数据时遇到错误,它会立即删除-
gates that error to the workflow processor and moves on to the next message in the event queue. In this way, overall responsiveness is not impacted because the next message is immediately processed. If the event consumer were to spend the time trying to figure out the error, then it is not reading the next message in the queue, therefore impacting the responsiveness not only of the next message, but all other messages waiting in the queue to be processed. 将错误传递给工作流处理器,并继续处理事件队列中的下一个消息。通过这种方式,整体响应性不受影响,因为下一个消息会立即被处理。如果事件消费者花时间试图找出错误,那么它就无法读取队列中的下一个消息,从而影响了下一个消息的响应性,以及所有其他等待处理的消息。
Once the workflow processor receives an error, it tries to figure out what is wrong with the message. This could be a static, deterministic error, or it could leverage some machine learning algorithms to analyze the message to see some anomaly in the data. Either way, the workflow processor programmatically (without human intervention) makes changes to the original data to try and repair it, and then sends it back to the originating queue. The event consumer sees this message as a new one and tries to process it again, hopefully this time with some success. Of course, there are many times when the workflow processor cannot determine what is wrong with the message. In these cases the workflow processor sends the message off to another queue, which is then received in what is usually called a “dashboard,” an application that looks similar to the Microsoft’s Outlook or Apple’s Mail. This dashboard usually resides on the desktop of a person of importance, who then looks at the message, applies manual fixes to it, and then resubmits it to the original queue (usually through a reply-to message header variable). 一旦工作流处理器收到错误,它会尝试找出消息中有什么问题。这可能是一个静态的、确定性的错误,或者它可能利用一些机器学习算法来分析消息,以查看数据中是否存在异常。无论哪种方式,工作流处理器都会以编程方式(无需人工干预)对原始数据进行更改,以尝试修复它,然后将其发送回原始队列。事件消费者将此消息视为新消息,并尝试再次处理,希望这次能成功。当然,工作流处理器有很多时候无法确定消息中有什么问题。在这些情况下,工作流处理器会将消息发送到另一个队列,该队列通常在一个被称为“仪表板”的应用程序中接收,该应用程序看起来类似于微软的 Outlook 或苹果的 Mail。这个仪表板通常位于一个重要人物的桌面上,他会查看消息,对其进行手动修复,然后将其重新提交到原始队列(通常通过回复消息头变量)。
Figure 14-14. Workflow event pattern of reactive architecture 图 14-14. 响应式架构的工作流事件模式
To illustrate the workflow event pattern, suppose a trading advisor in one part of the country accepts trade orders (instructions on what stock to buy and for how many shares) on behalf of a large trading firm in another part of the country. The advisor batches up the trade orders (what is usually called a basket) and asynchronously sends those to the large trading firm to be placed with a broker so the stock can be 为了说明工作流事件模式,假设一个地区的交易顾问代表另一个地区的大型交易公司接受交易订单(关于购买哪只股票以及购买多少股的指令)。顾问将交易订单(通常称为篮子)进行批处理,并异步地将其发送给大型交易公司,以便与经纪人下单,从而可以购买股票。
purchased. To simplify the example, suppose the contract for the trade instructions must adhere to the following: 购买。为了简化示例,假设交易指令的合同必须遵循以下内容:
Suppose the large trading firm receives the following basket of Apple (AAPL) trade orders from the trading advisor: 假设大型交易公司收到来自交易顾问的以下一篮子苹果(AAPL)交易订单:
Notice the forth trade instruction (2WE35HF6DHF,BUY,AAPL,8756 SHARES) has the word SHARES after the number of shares for the trade. When these asynchronous trade orders are processed by the large trading firm without any error handling capabilities, the following error occurs within the trade placement service: 注意第四条交易指令 (2WE35HF6DHF,BUY,AAPL,8756 SHARES) 在交易的股份数量后有一个单词 SHARES。当这些异步交易订单被大型交易公司处理时,如果没有任何错误处理能力,交易下单服务中会发生以下错误:
Exception in thread "main" java.lang.NumberFormatException:
For input string: "8756 SHARES"
at java.lang.NumberFormatException.forInputString
(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.<init>(Long.java:965)
at trading.TradePlacement.execute(TradePlacement.java:23)
at trading.TradePlacement.main(TradePlacement.java:29)
When this exception occurs, there is nothing that the trade placement service can do, because this was an asynchronous request, except to possibly log the error condition. In other words, there is no user to synchronously respond to and fix the error. 当此异常发生时,交易 placement 服务无能为力,因为这是一个异步请求,除了可能记录错误状态之外。换句话说,没有用户可以同步响应并修复错误。
Applying the workflow event pattern can programmatically fix this error. Because the large trading firm has no control over the trading advisor and the corresponding trade order data it sends, it must react to fix the error itself (as illustrated in Figure 14-15). When the same error occurs (2WE35HF6DHF,BUY,AAPL, 8756 SHARES), the Trade Placement service immediately delegates the error via asynchronous messaging to the Trade Placement Error service for error handling, passing with the error information about the exception: 应用工作流事件模式可以通过编程方式修复此错误。由于大型交易公司无法控制交易顾问及其发送的相应交易订单数据,因此必须自行反应以修复错误(如图 14-15 所示)。当同样的错误发生时(2WE35HF6DHF,BUY,AAPL, 8756 SHARES),交易下单服务立即通过异步消息将错误委托给交易下单错误服务进行错误处理,并传递有关异常的错误信息:
Trade Placed: 12654A87FR4,BUY,AAPL,1254
Trade Placed: 87R54E3068U,BUY,AAPL,3122
Trade Placed: 6R4NB7609JJ,BUY,AAPL,5433
Error Placing Trade: "2WE35HF6DHF,BUY,AAPL,8756 SHARES"
Sending to trade error processor <-- delegate the error fixing and move on
Trade Placed: 764980974R2,BUY,AAPL,1211
The Trade Placement Error service (acting as the workflow delegate) receives the error and inspects the exception. Seeing that it is an issue with the word SHARES in the number of shares field, the Trade Placement Error service strips off the word SHARES and resubmits the trade for reprocessing: 交易下单错误服务(作为工作流委托)接收错误并检查异常。看到这是与股份数量字段中的单词 SHARES 有关的问题,交易下单错误服务去掉单词 SHARES 并重新提交交易以进行重新处理:
Received Trade Order Error: 2WE35HF6DHF,BUY,AAPL,8756 SHARES
Trade fixed: 2WE35HF6DHF,BUY,AAPL,8756
Resubmitting Trade For Re-Processing
The fixed trade is then processed successfully by the trade placement service: 固定交易随后由交易下单服务成功处理:
trade placed: 1533G658HD8,BUY,AAPL,2654 交易已下单:1533G658HD8,买入,AAPL,2654
trade placed: 2WE35HF6DHF,BUY,AAPL, 8756 <-- this was the original trade in error 交易已下单:2WE35HF6DHF,BUY,AAPL, 8756 <-- 这是原始错误交易
Figure 14-15. Error handling with the workflow event pattern 图 14-15. 使用工作流事件模式的错误处理
One of the consequences of the workflow event pattern is that messages in error are processed out of sequence when they are resubmitted. In our trading example, the order of messages matters, because all trades within a given account must be processed in order (for example, a SELL for IBM must occur before a BUY for AAPL within the same brokerage account). Although not impossible, it is a complex task to maintain message order within a given context (in this case the brokerage account number). One way this can be addressed is by the Trade Placement service queueing and storing the account number of the trade in error. Any trade with that same account number would be stored in a temporary queue for later processing (in FIFO order). Once the trade originally in error is fixed and processed, the Trade Place ment service then de-queues the remaining trades for that same account and processes them in order. 工作流事件模式的一个后果是,当错误消息被重新提交时,它们会被按顺序处理。在我们的交易示例中,消息的顺序很重要,因为在给定账户内的所有交易必须按顺序处理(例如,在同一个经纪账户内,IBM 的卖出必须在 AAPL 的买入之前发生)。虽然不是不可能,但在给定上下文中(在这种情况下是经纪账户号码)维护消息顺序是一项复杂的任务。解决此问题的一种方法是通过交易下单服务队列排队并存储出错交易的账户号码。任何具有相同账户号码的交易将被存储在临时队列中以便稍后处理(按 FIFO 顺序)。一旦最初出错的交易被修复并处理,交易下单服务将从该账户中出队剩余的交易并按顺序处理它们。
Preventing Data Loss 防止数据丢失
Data loss is always a primary concern when dealing with asynchronous communications. Unfortunately, there are many places for data loss to occur within an eventdriven architecture. By data loss we mean a message getting dropped or never making it to its final destination. Fortunately, there are basic out-of-the-box techniques that can be leveraged to prevent data loss when using asynchronous messaging. 数据丢失在处理异步通信时始终是一个主要关注点。不幸的是,在事件驱动架构中,数据丢失发生的地方有很多。我们所说的数据丢失是指消息被丢弃或从未到达其最终目的地。幸运的是,有一些基本的开箱即用的技术可以用来防止在使用异步消息传递时的数据丢失。
To illustrate the issues associated with data loss within event-driven architecture, suppose Event Processor A asynchronously sends a message to a queue. Event Processor B accepts the message and inserts the data within the message into a database. As illustrated in Figure 14-16, three areas of data loss can occur within this typical scenario: 为了说明事件驱动架构中与数据丢失相关的问题,假设事件处理器 A 异步地将消息发送到队列。事件处理器 B 接收该消息并将消息中的数据插入数据库。如图 14-16 所示,在这种典型场景中可能发生三个数据丢失区域:
The message never makes it to the queue from Event Processor AA; or even if it does, the broker goes down before the next event processor can retrieve the message. 消息从事件处理器 AA 永远无法到达队列;即使到达了,代理在下一个事件处理器可以检索消息之前就崩溃了。
Event Processor B de-queues the next available message and crashes before it can process the event. 事件处理器 B 从队列中取出下一个可用消息,并在处理事件之前崩溃。
Event Processor BB is unable to persist the message to the database due to some data error. 事件处理器 BB 因数据错误无法将消息持久化到数据库。
Figure 14-16. Where data loss can happen within an event-driven architecture 图 14-16. 事件驱动架构中数据丢失可能发生的位置
Each of these areas of data loss can be mitigated through basic messaging techniques. Issue 1 (the message never makes it to the queue) is easily solved by leveraging persistent message queues, along with something called synchronous send. Persisted message queues support what is known as guaranteed delivery. When the message broker receives the message, it not only stores it in memory for fast retrieval, but also persists the message in some sort of physical data store (such as a filesystem or database). If the message broker goes down, the message is physically stored on disk so that when the message broker comes back up, the message is available for processing. Synchronous send does a blocking wait in the message producer until the broker has acknowledged that the message has been persisted. With these two basic techniques 每个数据丢失的领域都可以通过基本的消息传递技术来减轻。问题 1(消息从未到达队列)可以通过利用持久化消息队列以及一种称为同步发送的技术轻松解决。持久化消息队列支持所谓的保证交付。当消息代理接收到消息时,它不仅将其存储在内存中以便快速检索,还将消息持久化存储在某种物理数据存储中(例如文件系统或数据库)。如果消息代理出现故障,消息将物理存储在磁盘上,以便当消息代理恢复时,消息可以进行处理。同步发送在消息生产者中进行阻塞等待,直到代理确认消息已被持久化。通过这两种基本技术
there is no way to lose a message between the event producer and the queue because the message is either still with the message producer or persisted within the queue. 在事件生产者和队列之间没有丢失消息的方式,因为消息要么仍然在消息生产者那里,要么已持久化在队列中。
Issue 2 (Event Processor B de-queues the next available message and crashes before it can process the event) can also be solved using a basic technique of messaging called client acknowledge mode. By default, when a message is de-queued, it is immediately removed from the queue (something called auto acknowledge mode). Client acknowledge mode keeps the message in the queue and attaches the client ID to the message so that no other consumers can read the message. With this mode, if Event Processor B crashes, the message is still preserved in the queue, preventing message loss in this part of the message flow. 问题 2(事件处理器 B 从队列中取出下一个可用消息并在处理事件之前崩溃)也可以通过一种基本的消息传递技术来解决,称为客户端确认模式。默认情况下,当消息被取出时,它会立即从队列中移除(称为自动确认模式)。客户端确认模式将消息保留在队列中,并将客户端 ID 附加到消息上,以便其他消费者无法读取该消息。在这种模式下,如果事件处理器 B 崩溃,消息仍然保留在队列中,从而防止在消息流的这一部分丢失消息。
Issue 3 (Event Processor B is unable to persist the message to the database due to some data error) is addressed through leveraging ACID (atomicity, consistency, isolation, durability) transactions via a database commit. Once the database commit happens, the data is guaranteed to be persisted in the database. Leveraging something called last participant support (LPS) removes the message from the persisted queue by acknowledging that processing has been completed and that the message has been persisted. This guarantees the message is not lost during the transit from Event Processor A all the way to the database. These techniques are illustrated in Figure 14-17. 问题 3(事件处理器 B 由于某些数据错误无法将消息持久化到数据库)通过利用 ACID(原子性、一致性、隔离性、持久性)事务通过数据库提交来解决。一旦发生数据库提交,数据就保证会被持久化到数据库中。利用一种称为最后参与者支持(LPS)的机制,通过确认处理已完成并且消息已被持久化,从持久化队列中移除消息。这保证了消息在从事件处理器 A 传输到数据库的过程中不会丢失。这些技术在图 14-17 中进行了说明。
Figure 14-17. Preventing data loss within an event-driven architecture 图 14-17. 在事件驱动架构中防止数据丢失
Broadcast Capabilities 广播能力
One of the other unique characteristics of event-driven architecture is the capability to broadcast events without knowledge of who (if anyone) is receiving the message and what they do with it. This technique, which is illustrated in Figure 14-18, shows that when a producer publishes a message, that same message is received by multiple subscribers. 事件驱动架构的另一个独特特性是能够广播事件,而无需知道谁(如果有的话)正在接收消息以及他们如何处理它。这个技术在图 14-18 中进行了说明,显示当一个生产者发布一条消息时,多个订阅者会接收到同样的消息。
Figure 14-18. Broadcasting events to other event processors 图 14-18. 将事件广播到其他事件处理器
Broadcasting is perhaps the highest level of decoupling between event processors because the producer of the broadcast message usually does not know which event processors will be receiving the broadcast message and more importantly, what they will do with the message. Broadcast capabilities are an essential part of patterns for eventual consistency, complex event processing (CEP), and a host of other situations. Consider frequent changes in stock prices for instruments traded on the stock market. Every ticker (the current price of a particular stock) might influence a number of things. However, the service publishing the latest price simply broadcasts it with no knowledge of how that information will be used. 广播可能是事件处理器之间解耦的最高级别,因为广播消息的生产者通常不知道哪些事件处理器将接收广播消息,更重要的是,他们将如何处理该消息。广播能力是最终一致性、复杂事件处理(CEP)以及许多其他情况模式的重要组成部分。考虑股票市场上交易的工具的股票价格频繁变化。每个股票代码(特定股票的当前价格)可能会影响许多事情。然而,发布最新价格的服务只是广播它,而不知道该信息将如何被使用。
Request-Reply 请求-回复
So far in this chapter we’ve dealt with asynchronous requests that don’t need an immediate response from the event consumer. But what if an order ID is needed when ordering a book? What if a confirmation number is needed when booking a flight? These are examples of communication between services or event processors that require some sort of synchronous communication. 到目前为止,在本章中我们处理了不需要事件消费者立即响应的异步请求。但是,如果在订购一本书时需要订单 ID 呢?如果在预订航班时需要确认号码呢?这些是服务或事件处理器之间需要某种同步通信的例子。
In event-driven architecture, synchronous communication is accomplished through request-reply messaging (sometimes referred to as pseudosynchronous communications). Each event channel within request-reply messaging consists of two queues: a request queue and a reply queue. The initial request for information is asynchronously sent to the request queue, and then control is returned to the message producer. The message producer then does a blocking wait on the reply queue, waiting for the response. The message consumer receives and processes the message and then sends the response to the reply queue. The event producer then receives the message with the response data. This basic flow is illustrated in Figure 14-19. 在事件驱动架构中,同步通信是通过请求-回复消息实现的(有时称为伪同步通信)。请求-回复消息中的每个事件通道由两个队列组成:请求队列和回复队列。初始的信息请求异步发送到请求队列,然后控制权返回给消息生产者。消息生产者随后在回复队列上进行阻塞等待,等待响应。消息消费者接收并处理消息,然后将响应发送到回复队列。事件生产者随后接收带有响应数据的消息。这个基本流程在图 14-19 中进行了说明。
Figure 14-19. Request-reply message processing 图 14-19. 请求-回复消息处理
There are two primary techniques for implementing request-reply messaging. The first (and most common) technique is to use a correlation ID contained in the message header. A correlation ID is a field in the reply message that is usually set to the message ID of the original request message. This technique, as illustrated in Figure 14-20, works as follows, with the message ID indicated with ID, and the correlation ID indicated with CID: 实现请求-回复消息的主要有两种技术。第一种(也是最常见的)技术是使用包含在消息头中的关联 ID。关联 ID 是回复消息中的一个字段,通常设置为原始请求消息的消息 ID。该技术如图 14-20 所示,工作原理如下,其中消息 ID 用 ID 表示,关联 ID 用 CID 表示:
The event producer sends a message to the request queue and records the unique message ID (in this case ID 124). Notice that the correlation ID (CID) in this case is null. 事件生产者将消息发送到请求队列,并记录唯一的消息 ID(在本例中为 ID 124)。请注意,在这种情况下,关联 ID(CID)为 null。
The event producer now does a blocking wait on the reply queue with a message filter (also called a message selector), where the correlation ID in the message header equals the original message ID (in this case 124). Notice there are two messages in the reply queue: message ID 855 with correlation ID 120, and message ID 856 with correlation ID 122. Neither of these messages will be picked up because the correlation ID does not match what the event consumer is looking for (CID 124). 事件生产者现在在回复队列上进行阻塞等待,并使用消息过滤器(也称为消息选择器),其中消息头中的关联 ID 等于原始消息 ID(在本例中为 124)。请注意,回复队列中有两条消息:消息 ID 855,关联 ID 120,以及消息 ID 856,关联 ID 122。这两条消息都不会被处理,因为关联 ID 与事件消费者所寻找的(CID 124)不匹配。
The event consumer receives the message (ID 124) and processes the request. 事件消费者接收消息(ID 124)并处理请求。
The event consumer creates the reply message containing the response and sets the correlation ID (CID) in the message header to the original message ID (124). 事件消费者创建包含响应的回复消息,并将消息头中的关联 ID (CID) 设置为原始消息 ID (124)。
The event consumer sends the new message (ID 857) to the reply queue. 事件消费者将新消息(ID 857)发送到回复队列。
The event producer receives the message because the correlation ID (124) matches the message selector from step 2. 事件生产者接收到消息,因为关联 ID (124) 与步骤 2 中的消息选择器匹配。
Figure 14-20. Request-reply message processing using a correlation ID 图 14-20. 使用关联 ID 的请求-回复消息处理
The other technique used to implement request-reply messaging is to use a temporary queue for the reply queue. A temporary queue is dedicated to the specific request, created when the request is made and deleted when the request ends. This technique, as illustrated in Figure 14-21, does not require a correlation ID because the temporary queue is a dedicated queue only known to the event producer for the specific request. The temporary queue technique works as follows: 实现请求-回复消息传递的另一种技术是使用临时队列作为回复队列。临时队列专用于特定请求,在请求发出时创建,并在请求结束时删除。如图 14-21 所示,这种技术不需要关联 ID,因为临时队列是仅为特定请求而创建的专用队列,仅事件生产者知道。临时队列技术的工作原理如下:
The event producer creates a temporary queue (or one is automatically created, depending on the message broker) and sends a message to the request queue, 事件生产者创建一个临时队列(或者根据消息代理自动创建一个),并将消息发送到请求队列,
passing the name of the temporary queue in the reply-to header (or some other agreed-upon custom attribute in the message header). 在回复头中传递临时队列的名称(或消息头中其他约定的自定义属性)。
The event producer does a blocking wait on the temporary reply queue. No message selector is needed because any message sent to this queue belongs solely to the event producer that originally sent to the message. 事件生产者在临时回复队列上进行阻塞等待。无需消息选择器,因为发送到该队列的任何消息仅属于最初发送该消息的事件生产者。
The event consumer receives the message, processes the request, and sends a response message to the reply queue named in the reply-to header. 事件消费者接收消息,处理请求,并将响应消息发送到回复头中指定的回复队列。
The event processor receives the message and deletes the temporary queue. 事件处理器接收消息并删除临时队列。
Figure 14-21. Request-reply message processing using a temporary queue 图 14-21. 使用临时队列的请求-回复消息处理
While the temporary queue technique is much simpler, the message broker must create a temporary queue for each request made and then delete it immediately afterward. Large messaging volumes can significantly slow down the message broker and impact overall performance and responsiveness. For this reason we usually recommend using the correlation ID technique. 虽然临时队列技术要简单得多,但消息代理必须为每个请求创建一个临时队列,然后立即将其删除。大量消息的传递可能会显著减慢消息代理的速度,并影响整体性能和响应能力。因此,我们通常建议使用关联 ID 技术。
Choosing Between Request-Based and Event-Based 选择基于请求和基于事件的方式
The request-based model and event-based model are both viable approaches for designing software systems. However, choosing the right model is essential to the overall success of the system. We recommend choosing the request-based model for well-structured, data-driven requests (such as retrieving customer profile data) when certainty and control over the workflow is needed. We recommend choosing the event-based model for flexible, action-based events that require high levels of responsiveness and scale, with complex and dynamic user processing. 请求驱动模型和事件驱动模型都是设计软件系统的可行方法。然而,选择正确的模型对系统的整体成功至关重要。我们建议在需要对工作流程有确定性和控制时,选择请求驱动模型,适用于结构良好、数据驱动的请求(例如检索客户档案数据)。我们建议在需要高响应性和可扩展性的灵活、基于动作的事件中,选择事件驱动模型,适用于复杂和动态的用户处理。
Understanding the trade-offs with the event-based model also helps decide which one is the best fit. Table 14-3 lists the advantages and disadvantages of the event-based model of event-driven architecture. 理解事件驱动模型的权衡也有助于决定哪个是最合适的。表 14-3 列出了事件驱动架构的事件驱动模型的优缺点。
Table 14-3. Trade-offs of the event-driven model 表 14-3. 事件驱动模型的权衡
Advantages over request-based 相较于基于请求的优势
Trade-offs 权衡
Better response to dynamic user content 更好地响应动态用户内容
Only supports eventual consistency 仅支持最终一致性
Better scalability and elasticity 更好的可扩展性和弹性
Less control over processing flow 对处理流程的控制较少
Better agility and change management 更好的敏捷性和变更管理
Less certainty over outcome of event flow 事件流程结果的不确定性较大
Better adaptability and extensibility 更好的适应性和可扩展性
Difficult to test and debug 难以测试和调试
Better responsiveness and performance 更好的响应能力和性能
Better real-time decision making 更好的实时决策制定
Better reaction to situational awareness 更好地应对情境意识
Advantages over request-based Trade-offs
Better response to dynamic user content Only supports eventual consistency
Better scalability and elasticity Less control over processing flow
Better agility and change management Less certainty over outcome of event flow
Better adaptability and extensibility Difficult to test and debug
Better responsiveness and performance
Better real-time decision making
Better reaction to situational awareness | Advantages over request-based | Trade-offs |
| :--- | :--- |
| Better response to dynamic user content | Only supports eventual consistency |
| Better scalability and elasticity | Less control over processing flow |
| Better agility and change management | Less certainty over outcome of event flow |
| Better adaptability and extensibility | Difficult to test and debug |
| Better responsiveness and performance | |
| Better real-time decision making | |
| Better reaction to situational awareness | |
Hybrid Event-Driven Architectures 混合事件驱动架构
While many applications leverage the event-driven architecture style as the primary overarching architecture, in many cases event-driven architecture is used in conjunction with other architecture styles, forming what is known as a hybrid architecture. Some common architecture styles that leverage event-driven architecture as part of another architecture style include microservices and space-based architecture. Other hybrids that are possible include an event-driven microkernel architecture and an event-driven pipeline architecture. 虽然许多应用程序将事件驱动架构风格作为主要的总体架构,但在许多情况下,事件驱动架构与其他架构风格结合使用,形成所谓的混合架构。一些常见的架构风格将事件驱动架构作为其他架构风格的一部分,包括微服务和基于空间的架构。其他可能的混合架构包括事件驱动微内核架构和事件驱动管道架构。
Adding event-driven architecture to any architecture style helps remove bottlenecks, provides a back pressure point in the event requests get backed up, and provides a level of user responsiveness not found in other architecture styles. Both microservices and space-based architecture leverage messaging for data pumps, asynchronously sending data to another processor that in turn updates data in a database. Both also leverage event-driven architecture to provide a level of programmatic scalability to services in a microservices architecture and processing units in a space-based architecture when using messaging for interservice communication. 将事件驱动架构添加到任何架构风格中有助于消除瓶颈,在事件请求积压时提供一个反压点,并提供其他架构风格中没有的用户响应级别。微服务和基于空间的架构都利用消息传递作为数据泵,异步地将数据发送到另一个处理器,该处理器又更新数据库中的数据。两者还利用事件驱动架构为微服务架构中的服务和基于空间的架构中的处理单元提供程序化可扩展性,当使用消息传递进行服务间通信时。
Architecture Characteristics Ratings 架构特性评级
A one-star rating in the characteristics ratings table in Figure 14-2214-22 means the specific architecture characteristic isn’t well supported in the architecture, whereas a fivestar rating means the architecture characteristic is one of the strongest features in the architecture style. The definition for each characteristic identified in the scorecard can be found in Chapter 4. 在图 14-2214-22 的特征评分表中,一星评级意味着特定的架构特征在架构中支持不佳,而五星评级则意味着该架构特征是架构风格中最强的特征之一。评分卡中识别的每个特征的定义可以在第 4 章找到。
Event-driven architecture is primarily a technically partitioned architecture in that any particular domain is spread across multiple event processors and tied together through mediators,queues,and topics.Changes to a particular domain usually impact many event processors,mediators,and other messaging artifacts,hence why event-driven architecture is not domain partitioned. 事件驱动架构主要是一种技术分区架构,因为任何特定领域都分布在多个事件处理器上,并通过中介、队列和主题连接在一起。对特定领域的更改通常会影响许多事件处理器、中介和其他消息传递工件,这就是为什么事件驱动架构不是领域分区的原因
Architecture characteristic 架构特征
Star rating 星级评分
Partitioning type 分区类型
Technical 技术
Number of quanta 量子数
1 to many 1 对多
Deployability 可部署性
Elasticity 弹性
Evolutionary 演化的
Fault tolerance 容错
Modularity 模块化
式式気
Overall cost 总体成本
雄気 雄心
Performance 性能
Reliability 可靠性
艮気
Scalability 可扩展性
Simplicity 简单性
家
Testability 可测试性
式家
Architecture characteristic Star rating
Partitioning type Technical
Number of quanta 1 to many
Deployability https://cdn.mathpix.com/cropped/2025_02_11_560afc03665d2bf0aa03g-228.jpg?height=68&width=214&top_left_y=676&top_left_x=922
Elasticity https://cdn.mathpix.com/cropped/2025_02_11_560afc03665d2bf0aa03g-228.jpg?height=68&width=214&top_left_y=749&top_left_x=922
Evolutionary https://cdn.mathpix.com/cropped/2025_02_11_560afc03665d2bf0aa03g-228.jpg?height=75&width=341&top_left_y=822&top_left_x=922
Fault tolerance https://cdn.mathpix.com/cropped/2025_02_11_560afc03665d2bf0aa03g-228.jpg?height=68&width=341&top_left_y=902&top_left_x=922
Modularity 式式気
Overall cost 雄気
Performance https://cdn.mathpix.com/cropped/2025_02_11_560afc03665d2bf0aa03g-228.jpg?height=75&width=341&top_left_y=1125&top_left_x=922
Reliability 艮気
Scalability https://cdn.mathpix.com/cropped/2025_02_11_560afc03665d2bf0aa03g-228.jpg?height=81&width=341&top_left_y=1275&top_left_x=922
Simplicity 家
Testability 式家| Architecture characteristic | Star rating |
| :---: | :---: |
| Partitioning type | Technical |
| Number of quanta | 1 to many |
| Deployability |  |
| Elasticity |  |
| Evolutionary |  |
| Fault tolerance |  |
| Modularity | 式式気 |
| Overall cost | 雄気 |
| Performance |  |
| Reliability | 艮気 |
| Scalability |  |
| Simplicity | 家 |
| Testability | 式家 |
Figure 14-22.Event-driven architecture characteristics ratings 图 14-22.事件驱动架构特征评分
The number of quanta within event-driven architecture can vary from one to many quanta,which is usually based on the database interactions within each event processor and request-reply processing.Even though all communication in an event- driven architecture is asynchronous,if multiple event processors share a single data- base instance,they would all be contained within the same architectural quantum. The same is true for request-reply processing:even though the communication is still asynchronous between the event processors,if a request is needed right away from the event consumer,it ties those event processors together synchronously;hence they belong to the same quantum. 事件驱动架构中的量子数量可以从一个到多个量子变化,这通常基于每个事件处理器和请求-回复处理中的数据库交互。尽管事件驱动架构中的所有通信都是异步的,但如果多个事件处理器共享一个数据库实例,它们将都包含在同一个架构量子中。请求-回复处理也是如此:尽管事件处理器之间的通信仍然是异步的,但如果事件消费者需要立即发出请求,这将使这些事件处理器同步在一起;因此它们属于同一个量子
To illustrate this point, consider the example where one event processor sends a request to another event processor to place an order. The first event processor must wait for an order ID from the other event processor to continue. If the second event processor that places the order and generates an order ID is down, the first event processor cannot continue. Therefore, they are part of the same architecture quantum and share the same architectural characteristics, even though they are both sending and receiving asynchronous messages. 为了说明这一点,考虑一个例子,其中一个事件处理器向另一个事件处理器发送请求以下订单。第一个事件处理器必须等待来自另一个事件处理器的订单 ID 才能继续。如果负责下订单并生成订单 ID 的第二个事件处理器宕机,第一个事件处理器就无法继续。因此,它们是同一架构量的一部分,并共享相同的架构特征,即使它们都在发送和接收异步消息。
Event-driven architecture gains five stars for performance, scalability, and fault tolerance, the primary strengths of this architecture style. High performance is achieved through asynchronous communications combined with highly parallel processing. High scalability is realized through the programmatic load balancing of event processors (also called competing consumers). As the request load increases, additional event processors can be programmatically added to handle the additional requests. Fault tolerance is achieved through highly decoupled and asynchronous event processors that provide eventual consistency and eventual processing of event workflows. Providing the user interface or an event processor making a request does not need an immediate response, promises and futures can be leveraged to process the event at a later time if other downstream processors are not available. 事件驱动架构在性能、可扩展性和容错性方面获得了五颗星,这是这种架构风格的主要优势。通过异步通信结合高度并行处理,实现了高性能。通过事件处理器(也称为竞争消费者)的程序化负载均衡,实现了高可扩展性。随着请求负载的增加,可以程序化地添加额外的事件处理器来处理额外的请求。容错性通过高度解耦和异步的事件处理器实现,这些处理器提供最终一致性和事件工作流的最终处理。提供用户界面或发出请求的事件处理器不需要立即响应,如果其他下游处理器不可用,可以利用承诺和未来在稍后时间处理事件。
Overall simplicity and testability rate relatively low with event-driven architecture, mostly due to the nondeterministic and dynamic event flows typically found within this architecture style. While deterministic flows within the request-based model are relatively easy to test because the paths and outcomes are generally known, such is not the case with the event-driven model. Sometimes it is not known how event processors will react to dynamic events, and what messages they might produce. These “event tree diagrams” can be extremely complex, generating hundreds to even thousands of scenarios, making it very difficult to govern and test. 事件驱动架构的整体简单性和可测试性相对较低,这主要是由于这种架构风格中通常存在的非确定性和动态事件流。虽然基于请求的模型中的确定性流相对容易测试,因为路径和结果通常是已知的,但事件驱动模型则不是这样。有时无法知道事件处理器将如何对动态事件做出反应,以及它们可能产生什么消息。这些“事件树图”可能非常复杂,生成数百甚至数千种场景,使得治理和测试变得非常困难。
Finally, event-driven architectures are highly evolutionary, hence the five-star rating. Adding new features through existing or new event processors is relatively straightforward, particularly in the broker topology. By providing hooks via published messages in the broker topology, the data is already made available, hence no changes are required in the infrastructure or existing event processors to add that new functionality. 最后,事件驱动架构具有高度的演化性,因此获得了五星评级。通过现有或新的事件处理器添加新功能相对简单,特别是在代理拓扑中。通过在代理拓扑中通过发布的消息提供钩子,数据已经可用,因此在基础设施或现有事件处理器中添加新功能时无需进行更改。
CHAPTER 15 第 15 章
Space-Based Architecture Style 基于空间的架构风格
Most web-based business applications follow the same general request flow: a request from a browser hits the web server, then an application server, then finally the database server. While this pattern works great for a small set of users, bottlenecks start appearing as the user load increases, first at the web-server layer, then at the application-server layer, and finally at the database-server layer. The usual response to bottlenecks based on an increase in user load is to scale out the web servers. This is relatively easy and inexpensive, and it sometimes works to address the bottleneck issues. However, in most cases of high user load, scaling out the web-server layer just moves the bottleneck down to the application server. Scaling application servers can be more complex and expensive than web servers and usually just moves the bottleneck down to the database server, which is even more difficult and expensive to scale. Even if you can scale the database, what you eventually end up with is a triangleshaped topology, with the widest part of the triangle being the web servers (easiest to scale) and the smallest part being the database (hardest to scale), as illustrated in Figure 15-1. 大多数基于 Web 的业务应用程序遵循相同的一般请求流程:来自浏览器的请求首先到达 Web 服务器,然后是应用服务器,最后是数据库服务器。虽然这种模式在少量用户的情况下效果很好,但随着用户负载的增加,瓶颈开始出现,首先是在 Web 服务器层,然后是在应用服务器层,最后是在数据库服务器层。针对用户负载增加而导致的瓶颈的通常响应是扩展 Web 服务器。这相对简单且成本低,有时可以解决瓶颈问题。然而,在大多数高用户负载的情况下,扩展 Web 服务器层只是将瓶颈转移到应用服务器。扩展应用服务器可能比扩展 Web 服务器更复杂且成本更高,通常只是将瓶颈转移到数据库服务器,而扩展数据库服务器则更加困难且成本更高。 即使你可以扩展数据库,最终得到的也是一个三角形拓扑,三角形的最宽部分是 web 服务器(最容易扩展),而最小部分是数据库(最难扩展),如图 15-1 所示。
In any high-volume application with a large concurrent user load, the database will usually be the final limiting factor in how many transactions you can process concurrently. While various caching technologies and database scaling products help to address these issues, the fact remains that scaling out a normal application for extreme loads is a very difficult proposition. 在任何高并发用户负载的大型高流量应用中,数据库通常是限制您可以并发处理的事务数量的最终因素。虽然各种缓存技术和数据库扩展产品有助于解决这些问题,但事实仍然是,为极端负载扩展一个普通应用是一个非常困难的任务。
Figure 15-1. Scalability limits within a traditional web-based topology 图 15-1. 传统基于网络的拓扑中的可扩展性限制
The space-based architecture style is specifically designed to address problems involving high scalability, elasticity, and high concurrency issues. It is also a useful architecture style for applications that have variable and unpredictable concurrent user volumes. Solving the extreme and variable scalability issue architecturally is often a better approach than trying to scale out a database or retrofit caching technologies into a nonscalable architecture. 基于空间的架构风格专门设计用于解决涉及高可扩展性、弹性和高并发问题。对于具有可变和不可预测的并发用户量的应用程序,这也是一种有用的架构风格。从架构上解决极端和可变的可扩展性问题通常比尝试扩展数据库或将缓存技术改造到不可扩展的架构中更好。
General Topology 一般拓扑
Space-based architecture gets its name from the concept of tuple space, the technique of using multiple parallel processors communicating through shared memory. High scalability, high elasticity, and high performance are achieved by removing the central database as a synchronous constraint in the system and instead leveraging replicated in-memory data grids. Application data is kept in-memory and replicated among all the active processing units. When a processing unit updates data, it asynchronously sends that data to the database, usually via messaging with persistent queues. Processing units start up and shut down dynamically as user load increases and decreases, thereby addressing variable scalability. Because there is no central database involved in the standard transactional processing of the application, the database bottleneck is removed, thus providing near-infinite scalability within the application. 基于空间的架构得名于元组空间的概念,这是一种使用多个并行处理器通过共享内存进行通信的技术。通过消除中央数据库作为系统中的同步约束,并利用复制的内存数据网格,实现了高可扩展性、高弹性和高性能。应用数据保存在内存中,并在所有活跃的处理单元之间进行复制。当处理单元更新数据时,它会异步地将数据发送到数据库,通常通过持久队列进行消息传递。处理单元根据用户负载的增加和减少动态启动和关闭,从而解决了可变的可扩展性。由于在应用程序的标准事务处理过程中没有涉及中央数据库,因此消除了数据库瓶颈,从而在应用程序内提供了近乎无限的可扩展性。
There are several architecture components that make up a space-based architecture: a processing unit containing the application code, virtualized middleware used to manage and coordinate the processing units, data pumps to asynchronously send updated data to the database, data writers that perform the updates from the data pumps, and data readers that read database data and deliver it to processing units upon startup. Figure 15-2 illustrates these primary architecture components. 空间基础架构由几个架构组件组成:包含应用程序代码的处理单元、用于管理和协调处理单元的虚拟化中间件、用于异步发送更新数据到数据库的数据泵、从数据泵执行更新的数据写入器,以及在启动时读取数据库数据并将其传递给处理单元的数据读取器。图 15-2 展示了这些主要架构组件。
The processing unit (illustrated in Figure 15-3) contains the application logic (or portions of the application logic). This usually includes web-based components as well as backend business logic. The contents of the processing unit vary based on the type of application. Smaller web-based applications would likely be deployed into a single processing unit, whereas larger applications may split the application functionality into multiple processing units based on the functional areas of the application. The processing unit can also contain small, single-purpose services (as with microservices). In addition to the application logic, the processing unit also contains an inmemory data grid and replication engine usually implemented through such products as Hazelcast, Apache Ignite, and Oracle Coherence. 处理单元(如图 15-3 所示)包含应用逻辑(或应用逻辑的部分)。这通常包括基于 Web 的组件以及后端业务逻辑。处理单元的内容根据应用程序的类型而有所不同。较小的基于 Web 的应用程序可能会部署到单个处理单元中,而较大的应用程序可能会根据应用程序的功能区域将应用功能拆分为多个处理单元。处理单元还可以包含小型的单一目的服务(如微服务)。除了应用逻辑,处理单元还包含一个内存数据网格和复制引擎,通常通过如 Hazelcast、Apache Ignite 和 Oracle Coherence 等产品实现。
Figure 15-3. Processing unit 图 15-3. 处理单元
Virtualized Middleware 虚拟化中间件
The virtualized middleware handles the infrastructure concerns within the architecture that control various aspects of data synchronization and request handling. The components that make up the virtualized middleware include a messaging grid, data grid, processing grid, and deployment manager. These components, which are described in detail in the next sections, can be custom written or purchased as thirdparty products. 虚拟化中间件处理架构内的基础设施问题,控制数据同步和请求处理的各个方面。构成虚拟化中间件的组件包括消息网格、数据网格、处理网格和部署管理器。这些组件将在接下来的部分中详细描述,可以自定义编写或作为第三方产品购买。
Messaging grid 消息网格
The messaging grid, shown in Figure 15-4, manages input request and session state. When a request comes into the virtualized middleware, the messaging grid component determines which active processing components are available to receive the request and forwards the request to one of those processing units. The complexity of the messaging grid can range from a simple round-robin algorithm to a more complex next-available algorithm that keeps track of which request is being processed by which processing unit. This component is usually implemented using a typical web server with load-balancing capabilities (such as HA Proxy and Nginx). 消息网格,如图 15-4 所示,管理输入请求和会话状态。当请求进入虚拟化中间件时,消息网格组件确定哪些活动处理组件可用以接收请求,并将请求转发给其中一个处理单元。消息网格的复杂性可以从简单的轮询算法到更复杂的下一个可用算法,后者跟踪哪个请求正在被哪个处理单元处理。该组件通常使用具有负载均衡能力的典型 Web 服务器实现(如 HA Proxy 和 Nginx)。
Figure 15-4. Messaging grid 图 15-4. 消息传递网格
Data grid 数据网格
The data grid component is perhaps the most important and crucial component in this architecture style. In most modern implementations the data grid is implemented solely within the processing units as a replicated cache. However, for those replicated caching implementations that require an external controller, or when using a distributed cache, this functionality would reside in both the processing units as well as in the data grid component within the virtualized middleware. Since the messaging grid can forward a request to any of the processing units available, it is essential that each processing unit contains exactly the same data in its in-memory data grid. Although Figure 15-5 shows a synchronous data replication between processing units, in reality this is done asynchronously and very quickly, usually completing the data synchronization in less than 100 milliseconds. 数据网格组件可能是这种架构风格中最重要和关键的组件。在大多数现代实现中,数据网格仅在处理单元内作为复制缓存实现。然而,对于那些需要外部控制器的复制缓存实现,或者在使用分布式缓存时,这一功能将同时存在于处理单元和虚拟化中间件内的数据网格组件中。由于消息网格可以将请求转发到任何可用的处理单元,因此每个处理单元的内存数据网格中必须包含完全相同的数据。尽管图 15-5 显示了处理单元之间的同步数据复制,但实际上这是异步且非常快速地完成的,通常在不到 100 毫秒的时间内完成数据同步。
Figure 15-5. Data grid 图 15-5. 数据网格
Data is synchronized between processing units that contain the same named data grid. To illustrate this point, consider the following code in Java using Hazelcast that creates an internal replicated data grid for processing units containing customer profile information: 数据在包含相同名称数据网格的处理单元之间进行同步。为了说明这一点,考虑以下使用 Hazelcast 的 Java 代码,该代码为包含客户档案信息的处理单元创建一个内部复制数据网格:
All processing units needing access to the customer profile information would contain this code. Changes made to the CustomerProfile named cache from any of the processing units would have that change replicated to all other processing units containing that same named cache. A processing unit can contain as many replicated caches as needed to complete its work. Alternatively, one processing unit can make a remote call to another processing unit to ask for data (choreography) or leverage the processing grid (described in the next section) to orchestrate the request. 所有需要访问客户档案信息的处理单元都将包含此代码。任何处理单元对名为 CustomerProfile 的缓存所做的更改都会在所有包含该同名缓存的其他处理单元中复制该更改。一个处理单元可以包含任意数量的复制缓存以完成其工作。或者,一个处理单元可以向另一个处理单元发起远程调用以请求数据(编排),或利用处理网格(在下一节中描述)来协调请求。
Data replication within the processing units also allows service instances to come up and down without having to read data from the database, providing there is at least one instance containing the named replicated cache. When a processing unit instance comes up, it connects to the cache provider (such as Hazelcast) and makes a request to get the named cache. Once the connection is made to the other processing units, the cache will be loaded from one of the other instances. 处理单元内的数据复制还允许服务实例在不必从数据库读取数据的情况下启动和关闭,只要至少有一个实例包含命名的复制缓存。当处理单元实例启动时,它会连接到缓存提供者(如 Hazelcast)并请求获取命名缓存。一旦与其他处理单元建立连接,缓存将从其他实例之一加载。
Each processing unit knows about all other processing unit instances through the use of a member list. The member list contains the IP address and ports of all other processing units using that same named cache. For example, suppose there is a single processing instance containing code and replicated cached data for the customer profile. In this case there is only one instance, so the member list for that instance only contains itself, as illustrated in the following logging statements generated using Hazelcast: 每个处理单元通过使用成员列表了解所有其他处理单元实例。成员列表包含使用相同命名缓存的所有其他处理单元的 IP 地址和端口。例如,假设有一个单一的处理实例,其中包含客户档案的代码和复制的缓存数据。在这种情况下,只有一个实例,因此该实例的成员列表仅包含它自己,如以下使用 Hazelcast 生成的日志语句所示:
Instance 1:
Members {size:1, ver:1} [
Member [172.19.248.89]:5701 - 04a6f863-dfce-41e5-9d51-9f4e356ef268 this
]
When another processing unit starts up with the same named cache, the member list of both services is updated to reflect the IP address and port of each processing unit: 当另一个处理单元以相同名称的缓存启动时,两个服务的成员列表会更新,以反映每个处理单元的 IP 地址和端口:
Instance 1:
Members {size:2, ver:2} [
Member [172.19.248.89]:5701 - 04a6f863-dfce-41e5-9d51-9f4e356ef268 this
Member [172.19.248.90]:5702 - ea9e4dd5-5cb3-4b27-8fe8-db5cc62c7316
]
Instance 2:
Members {size:2, ver:2} [
Member [172.19.248.89]:5701 - 04a6f863-dfce-41e5-9d51-9f4e356ef268
Member [172.19.248.90]:5702 - ea9e4dd5-5cb3-4b27-8fe8-db5cc62c7316 this
]
When a third processing unit starts up, the member list of instance 1 and instance 2 are both updated to reflect the new third instance: 当第三个处理单元启动时,实例 1 和实例 2 的成员列表都更新以反映新的第三个实例:
Instance 1:
Members {size:3, ver:3} [
Member [172.19.248.89]:5701 - 04a6f863-dfce-41e5-9d51-9f4e356ef268 this
Member [172.19.248.90]:5702 - ea9e4dd5-5cb3-4b27-8fe8-db5cc62c7316
Member [172.19.248.91]:5703 - 1623eadf-9cfb-4b83-9983-d80520cef753
]
Instance 2:
Members {size:3, ver:3} [
Member [172.19.248.89]:5701 - 04a6f863-dfce-41e5-9d51-9f4e356ef268
Member [172.19.248.90]:5702 - ea9e4dd5-5cb3-4b27-8fe8-db5cc62c7316 this
Member [172.19.248.91]:5703 - 1623eadf-9cfb-4b83-9983-d80520cef753
]
Instance 3:
Members {size:3, ver:3} [
Member [172.19.248.89]:5701 - 04a6f863-dfce-41e5-9d51-9f4e356ef268
Member [172.19.248.90]:5702 - ea9e4dd5-5cb3-4b27-8fe8-db5cc62c7316
Member [172.19.248.91]:5703 - 1623eadf-9cfb-4b83-9983-d80520cef753 this
]
Notice that all three instances know about each other (including themselves). Suppose instance 1 receives a request to update the customer profile information. When instance 1 updates the cache with a cache.put() or similar cache update method, the data grid (such as Hazelcast) will asynchronously update the other replicated caches with the same update, ensuring all three customer profile caches always remain in sync with one another. 请注意,所有三个实例都相互了解(包括它们自己)。假设实例 1 收到更新客户档案信息的请求。当实例 1 使用 cache.put() 或类似的缓存更新方法更新缓存时,数据网格(例如 Hazelcast)将异步更新其他复制的缓存,以确保所有三个客户档案缓存始终保持同步。
When processing unit instances go down, all other processing units are automatically updated to reflect the lost member. For example, if instance 2 goes down, the member lists of instance 1 and 3 are updated as follows: 当处理单元实例出现故障时,所有其他处理单元会自动更新以反映丢失的成员。例如,如果实例 2 出现故障,则实例 1 和 3 的成员列表将更新如下:
Instance 1:
Members {size:2, ver:4} [
Member [172.19.248.89]:5701 - 04a6f863-dfce-41e5-9d51-9f4e356ef268 this
Member [172.19.248.91]:5703 - 1623eadf-9cfb-4b83-9983-d80520cef753
]
Instance 3:
Members {size:2, ver:4} [
Member [172.19.248.89]:5701 - 04a6f863-dfce-41e5-9d51-9f4e356ef268
Member [172.19.248.91]:5703 - 1623eadf-9cfb-4b83-9983-d80520cef753 this
]
Processing grid 处理网格
The processing grid, illustrated in Figure 15-6, is an optional component within the virtualized middleware that manages orchestrated request processing when there are multiple processing units involved in a single business request. If a request comes in that requires coordination between processing unit types (e.g., an order processing unit and a payment processing unit), it is the processing grid that mediates and orchestrates the request between those two processing units. 处理网格,如图 15-6 所示,是虚拟化中间件中的一个可选组件,当单个业务请求涉及多个处理单元时,它管理协调的请求处理。如果有一个请求需要在处理单元类型之间进行协调(例如,订单处理单元和支付处理单元),那么处理网格就是在这两个处理单元之间进行调解和协调请求的。
Figure 15-6. Processing grid 图 15-6. 处理网格
Deployment manager 部署管理器
The deployment manager component manages the dynamic startup and shutdown of processing unit instances based on load conditions. This component continually monitors response times and user loads, starts up new processing units when load increases, and shuts down processing units when the load decreases. It is a critical component to achieving variable scalability (elasticity) needs within an application. 部署管理器组件根据负载条件管理处理单元实例的动态启动和关闭。该组件持续监控响应时间和用户负载,当负载增加时启动新的处理单元,当负载减少时关闭处理单元。它是实现应用程序中可变可扩展性(弹性)需求的关键组件。
Data Pumps 数据泵
A data pump is a way of sending data to another processor which then updates data in a database. Data pumps are a necessary component within space-based architecture, as processing units do not directly read from and write to a database. Data pumps within a space-based architecture are always asynchronous, providing eventual consistency with the in-memory cache and the database. When a processing unit instance receives a request and updates its cache, that processing unit becomes the owner of the update and is therefore responsible for sending that update through the data pump so that the database can be updated eventually. 数据泵是一种将数据发送到另一个处理器的方式,该处理器随后在数据库中更新数据。数据泵是基于空间架构中的必要组件,因为处理单元不直接从数据库读取和写入。基于空间架构中的数据泵始终是异步的,提供与内存缓存和数据库的最终一致性。当处理单元实例接收到请求并更新其缓存时,该处理单元成为更新的所有者,因此负责通过数据泵发送该更新,以便数据库最终可以被更新。
Data pumps are usually implemented using messaging, as shown in Figure 15-7. Messaging is a good choice for data pumps when using a space-based architecture. Not only does messaging support asynchronous communication, but it also supports guaranteed delivery and preserving message order through first-in, first-out (FIFO) queueing. Furthermore, messaging provides a decoupling between the processing unit and the data writer so that if the data writer is not available, uninterrupted processing can still take place within the processing units. 数据泵通常使用消息传递来实现,如图 15-7 所示。当使用基于空间的架构时,消息传递是数据泵的一个不错选择。消息传递不仅支持异步通信,还支持通过先进先出(FIFO)队列保证交付和保持消息顺序。此外,消息传递在处理单元和数据写入器之间提供了解耦,因此如果数据写入器不可用,处理单元仍然可以进行不间断的处理。
Figure 15-7. Data pump used to send data to a database 图 15-7. 用于将数据发送到数据库的数据泵
In most cases there are multiple data pumps, each one usually dedicated to a particular domain or subdomain (such as customer or inventory). Data pumps can be dedicated to each type of cache (such as CustomerProfile, CustomerWishlist, and so on), or they can be dedicated to a processing unit domain (such as Customer) containing a much larger and general cache. 在大多数情况下,有多个数据泵,每个数据泵通常专用于特定的领域或子领域(例如客户或库存)。数据泵可以专用于每种类型的缓存(例如 CustomerProfile、CustomerWishlist 等),或者它们可以专用于一个处理单元领域(例如 Customer),该领域包含一个更大且更通用的缓存。
Data pumps usually have associated contracts, including an action associated with the contract data (add, delete, or update). The contract can be a JSON schema, XML schema, an object, or even a value-driven message (map message containing namevalue pairs). For updates, the data contained in the message of the data pump usually only contains the new data values. For example, if a customer changes a phone number on their profile, only the new phone number would be sent, along with the customer ID and an action to update the data. 数据泵通常有相关的合同,包括与合同数据相关的操作(添加、删除或更新)。合同可以是 JSON 架构、XML 架构、一个对象,甚至是一个值驱动的消息(包含名称值对的映射消息)。对于更新,数据泵消息中包含的数据通常只包含新的数据值。例如,如果客户在其个人资料上更改了电话号码,则只会发送新的电话号码,以及客户 ID 和更新数据的操作。
Data Writers 数据写入器
The data writer component accepts messages from a data pump and updates the database with the information contained in the message of the data pump (see Figure 15-7). Data writers can be implemented as services, applications, or data hubs (such as Ab Initio). The granularity of the data writers can vary based on the scope of the data pumps and processing units. 数据写入组件接受来自数据泵的消息,并使用数据泵消息中包含的信息更新数据库(见图 15-7)。数据写入器可以作为服务、应用程序或数据中心(如 Ab Initio)实现。数据写入器的粒度可以根据数据泵和处理单元的范围而有所不同。
A domain-based data writer contains all of the necessary database logic to handle all the updates within a particular domain (such as customer), regardless of the number of data pumps it is accepting. Notice in Figure 15-8 that there are four different processing units and four different data pumps representing the customer domain (Profile, WishList, Wallet, and Preferences) but only one data writer. The single customer data writer listens to all four data pumps and contains the necessary database logic (such as SQL) to update the customer-related data in the database. 基于域的数据写入器包含处理特定域(例如客户)内所有更新所需的所有数据库逻辑,无论它接受多少个数据泵。请注意在图 15-8 中,有四个不同的处理单元和四个不同的数据泵代表客户域(Profile、WishList、Wallet 和 Preferences),但只有一个数据写入器。单个客户数据写入器监听所有四个数据泵,并包含更新数据库中与客户相关的数据所需的数据库逻辑(例如 SQL)。
Figure 15-8. Domain-based data writer 图 15-8. 基于域的数据写入器
Alternatively, each class of processing unit can have its own dedicated data writer component, as illustrated in Figure 15-9. In this model the data writer is dedicated to each corresponding data pump and contains only the database processing logic for that particular processing unit (such as Wallet). While this model tends to produce too many data writer components, it does provide better scalability and agility due to the alignment of processing unit, data pump, and data writer. 另外,每类处理单元可以拥有其专用的数据写入组件,如图 15-9 所示。在此模型中,数据写入器专用于每个相应的数据泵,并仅包含该特定处理单元(例如钱包)的数据库处理逻辑。虽然该模型往往会产生过多的数据写入组件,但由于处理单元、数据泵和数据写入器的对齐,它确实提供了更好的可扩展性和灵活性。
Figure 15-9. Dedicated data writers for each data pump 图 15-9. 每个数据泵的专用数据写入器
Data Readers 数据读取器
Whereas data writers take on the responsibility for updating the database, data readers take on the responsibility for reading data from the database and sending it to the processing units via a reverse data pump. In space-based architecture, data readers are only invoked under one of three situations: a crash of all processing unit instances of the same named cache, a redeployment of all processing units within the same named cache, or retrieving archive data not contained in the replicated cache. 数据写入者负责更新数据库,而数据读取者负责从数据库中读取数据,并通过反向数据泵将其发送到处理单元。在基于空间的架构中,数据读取者仅在以下三种情况之一被调用:同名缓存的所有处理单元实例崩溃、同名缓存内所有处理单元的重新部署,或检索不包含在复制缓存中的归档数据。
In the event where all instances come down (due to a system-wide crash or redeployment of all instances), data must be read from the database (something that is generally avoided in space-based architecture). When instances of a class of processing unit start coming up, each one tries to grab a lock on the cache. The first one to get the lock becomes the temporary cache owner; the others go into a wait state until the lock is released (this might vary based on the type of cache implementation being used, but regardless, there is one primary owner of the cache in this scenario). To load the cache, the instance that gained temporary cache owner status sends a message to a queue requesting data. The data reader component accepts the read request and then performs the necessary database query logic to retrieve the data needed by the processing unit. As the data reader queries data from the database, it sends that data to a different queue (called a reverse data pump). The temporary cache owner processing unit receives the data from the reverse data pump and loads the cache. Once all the data is loaded, the temporary owner releases the lock on the cache, all other instances are then synchronized, and processing can begin. This processing flow is illustrated in Figure 15-10. 在所有实例都崩溃(由于系统范围的崩溃或所有实例的重新部署)时,必须从数据库中读取数据(这在基于空间的架构中通常是避免的)。当一类处理单元的实例开始启动时,每个实例都尝试在缓存上获取锁。第一个获得锁的实例成为临时缓存所有者;其他实例进入等待状态,直到锁被释放(这可能会根据所使用的缓存实现类型而有所不同,但无论如何,在这种情况下,缓存有一个主要所有者)。为了加载缓存,获得临时缓存所有者状态的实例向队列发送请求数据的消息。数据读取组件接受读取请求,然后执行必要的数据库查询逻辑以检索处理单元所需的数据。当数据读取器从数据库查询数据时,它将数据发送到另一个队列(称为反向数据泵)。临时缓存所有者处理单元从反向数据泵接收数据并加载缓存。 一旦所有数据加载完成,临时所有者将释放对缓存的锁,所有其他实例随后被同步,处理可以开始。此处理流程如图 15-10 所示。
Figure 15-10. Data reader with reverse data pump 图 15-10. 带有反向数据泵的数据读取器
Like data writers, data readers can also be domain-based or dedicated to a specific class of processing unit (which is usually the case). The implementation is also the same as the data writers-either service, application, or data hub. 像数据写入者一样,数据读取者也可以是基于领域的,或者专门针对特定类型的处理单元(通常是这种情况)。实现方式与数据写入者相同——可以是服务、应用程序或数据中心。
The data writers and data readers essentially form what is usually known as a data abstraction layer (or data access layer in some cases). The difference between the two is in the amount of detailed knowledge the processing units have with regard to the structure of the tables (or schema) in the database. A data access layer means that the processing units are coupled to the underlying data structures in the database, and 数据写入器和数据读取器本质上形成了通常所称的数据抽象层(在某些情况下称为数据访问层)。两者之间的区别在于处理单元对数据库中表(或模式)结构的详细知识量。数据访问层意味着处理单元与数据库中的底层数据结构耦合,且
only use the data readers and writers to indirectly access the database. A data abstraction layer, on the other hand, means that the processing unit is decoupled from the underlying database table structures through separate contracts. Space-based architecture generally relies on a data abstraction layer model so that the replicated cache schema in each processing unit can be different than the underlying database table structures. This allows for incremental changes to the database without necessarily impacting the processing units. To facilitate this incremental change, the data writers and data readers contain transformation logic so that if a column type changes or a column or table is dropped, the data readers and data writers can buffer the database change until the necessary changes can be made to the processing unit caches. 仅使用数据读取器和写入器间接访问数据库。另一方面,数据抽象层意味着处理单元通过单独的契约与底层数据库表结构解耦。基于空间的架构通常依赖于数据抽象层模型,以便每个处理单元中的复制缓存模式可以与底层数据库表结构不同。这允许对数据库进行增量更改,而不必影响处理单元。为了促进这种增量更改,数据写入器和数据读取器包含转换逻辑,以便如果列类型更改或列或表被删除,数据读取器和数据写入器可以缓冲数据库更改,直到可以对处理单元缓存进行必要的更改。
Data Collisions 数据冲突
When using replicated caching in an active/active state where updates can occur to any service instance containing the same named cache, there is the possibility of a data collision due to replication latency. A data collision occurs when data is updated in one cache instance (cache A), and during replication to another cache instance (cache B), the same data is updated by that cache (cache B). In this scenario, the local update to cache BB will be overridden through replication by the old data from cache AA, and through replication the same data in cache AA will be overridden by the update from cache BB. 在使用复制缓存的主动/主动状态时,任何包含相同命名缓存的服务实例都可能发生更新,这可能会由于复制延迟而导致数据冲突。当在一个缓存实例(缓存 A)中更新数据时,数据冲突发生,而在复制到另一个缓存实例(缓存 B)期间,该缓存(缓存 B)更新了相同的数据。在这种情况下,对缓存 BB 的本地更新将通过复制被来自缓存 AA 的旧数据覆盖,并且通过复制,缓存 AA 中的相同数据将被来自缓存 BB 的更新覆盖。
To illustrate this problem, assume there are two service instances (Service A and Service BB ) containing a replicated cache of product inventory. The following flow demonstrates the data collision problem: 为了说明这个问题,假设有两个服务实例(服务 A 和服务 BB ),它们包含一个复制的产品库存缓存。以下流程演示了数据冲突问题:
The current inventory count for blue widgets is 500 units 当前蓝色小部件的库存数量为 500 个单位
Service A updates the inventory cache for blue widgets to 490 units ( 10 sold) 服务 A 将蓝色小部件的库存缓存更新为 490 个(售出 10 个)
During replication, Service B updates the inventory cache for blue widgets to 495 units (5 sold) 在复制过程中,Service B 将蓝色小部件的库存缓存更新为 495 个单位(售出 5 个)
The Service B cache gets updated to 490 units due to replication from Service A update 由于来自服务 A 的更新,服务 B 的缓存更新为 490 个单位
The Service A cache gets updates to 495 units due to replication from Service B update 由于来自服务 B 的更新,服务 A 的缓存更新了 495 个单位
Both caches in Service A and B are incorrect and out of sync (inventory should be 485 units) 服务 A 和 B 中的两个缓存都是不正确的且不同步(库存应该是 485 个单位)
There are several factors that influence how many data collisions might occur: the number of processing unit instances containing the same cache, the update rate of the cache, the cache size, and finally the replication latency of the caching product. The formula used to determine probabilistically how many potential data collisions might occur based on these factors is as follows: 影响数据冲突发生数量的因素有几个:包含相同缓存的处理单元实例数量、缓存的更新速率、缓存大小,以及最后缓存产品的复制延迟。根据这些因素,用于概率性地确定可能发生的数据冲突数量的公式如下:
" CollisionRate "=N^(**)(UR^(2))/(S)***RL\text { CollisionRate }=N^{*} \frac{U R^{2}}{S} \star R L
where NN represents the number of service instances using the same named cache, URU R represents the update rate in milliseconds (squared), SS the cache size (in terms of number of rows), and RLR L the replication latency of the caching product. 其中 NN 表示使用相同命名缓存的服务实例数量, URU R 表示更新速率(以毫秒为单位的平方), SS 表示缓存大小(以行数为单位), RLR L 表示缓存产品的复制延迟。
This formula is useful for determining the percentage of data collisions that will likely occur and hence the feasibility of the use of replicated caching. For example, consider the following values for the factors involved in this calculation: 这个公式对于确定可能发生的数据冲突百分比以及复制缓存使用的可行性非常有用。例如,考虑以下计算中涉及的因素的值:
Applying these factors to the formula yields 72,000 updates and hour, with a high probability that 14 updates to the same data may collide. Given the low percentage ( 0.02%0.02 \% ), replication would be a viable option. 将这些因素应用于公式得出每小时 72,000 次更新,并且有很高的概率 14 次对同一数据的更新可能会发生冲突。考虑到低百分比( 0.02%0.02 \% ),复制将是一个可行的选择。
Varying the replication latency has a significant impact on the consistency of data. Replication latency depends on many factors, including the type of network and the physical distance between processing units. For this reason replication latency values are rarely published and must be calculated and derived from actual measurements in a production environment. The value used in the prior example ( 100 milliseconds) is a good planning number if the actual replication latency, a value we frequently use to determine the number of data collisions, is not available. For example, changing the replication latency from 100 milliseconds to 1 millisecond yields the same number of updates ( 72,000 per hour) but produces only the probability of 0.1 collisions per hour! This scenario is shown in the following table: 变化复制延迟对数据的一致性有显著影响。复制延迟依赖于许多因素,包括网络类型和处理单元之间的物理距离。因此,复制延迟值很少被发布,必须通过在生产环境中的实际测量来计算和推导。前一个例子中使用的值(100 毫秒)是一个很好的规划数字,如果实际的复制延迟(我们经常用来确定数据冲突数量的值)不可用。例如,将复制延迟从 100 毫秒更改为 1 毫秒会产生相同数量的更新(每小时 72,000 次),但每小时仅产生 0.1 次冲突的概率!此场景在下表中显示:
Update rate (UR): 更新速率 (UR):
20 updates/second 20 更新/秒
Number of instances (N): 实例数量 (N):
5
Cache size (S): 缓存大小 (S):
50,000 rows 50,000 行
Replication latency (RL): 复制延迟 (RL):
1 millisecond (changed from 100) 1 毫秒(从 100 改为)
Updates: 更新:
72,000 per hour 每小时 72,000
Collision rate: 碰撞率:
0.1 per hour 每小时 0.1
Percentage: 百分比:
0.0002%0.0002 \%
Update rate (UR): 20 updates/second
Number of instances (N): 5
Cache size (S): 50,000 rows
Replication latency (RL): 1 millisecond (changed from 100)
Updates: 72,000 per hour
Collision rate: 0.1 per hour
Percentage: 0.0002%| Update rate (UR): | 20 updates/second |
| :--- | :--- |
| Number of instances (N): | 5 |
| Cache size (S): | 50,000 rows |
| Replication latency (RL): | 1 millisecond (changed from 100) |
| Updates: | 72,000 per hour |
| Collision rate: | 0.1 per hour |
| Percentage: | $0.0002 \%$ |
The number of processing units containing the same named cache (as represented through the number of instances factor) also has a direct proportional relationship to the number of data collisions possible. For example, reducing the number of processing units from 5 instances to 2 instances yields a data collision rate of only 6 per hour out of 72,000 updates per hour: 包含相同名称缓存的处理单元数量(通过实例数量因子表示)与可能的数据冲突数量之间也存在直接的正比例关系。例如,将处理单元的数量从 5 个实例减少到 2 个实例,每小时的数据冲突率仅为每 72,000 次更新中的 6 次:
Update rate (UR): 更新速率 (UR):
20 updates/second 20 更新/秒
Number of instances (N): 实例数量 (N):
2 (changed from 5) 2(从 5 更改)
Cache size (S): 缓存大小 (S):
50,000 rows 50,000 行
Replication latency (RL): 复制延迟 (RL):
100 milliseconds 100 毫秒
Updates: 更新:
72,000 per hour 每小时 72,000
Collision rate: 碰撞率:
5.8 per hour 每小时 5.8
Percentage: 百分比:
0.008%0.008 \%
Update rate (UR): 20 updates/second
Number of instances (N): 2 (changed from 5)
Cache size (S): 50,000 rows
Replication latency (RL): 100 milliseconds
Updates: 72,000 per hour
Collision rate: 5.8 per hour
Percentage: 0.008%| Update rate (UR): | 20 updates/second |
| :--- | :--- |
| Number of instances (N): | 2 (changed from 5) |
| Cache size (S): | 50,000 rows |
| Replication latency (RL): | 100 milliseconds |
| Updates: | 72,000 per hour |
| Collision rate: | 5.8 per hour |
| Percentage: | $0.008 \%$ |
The cache size is the only factor that is inversely proportional to the collision rate. As the cache size decreases, collision rates increase. In our example, reducing the cache size from 50,000 rows to 10,000 rows (and keeping everything the same as in the first example) yields a collision rate of 72 per hour, significantly higher than with 50,000 rows: 缓存大小是与碰撞率成反比的唯一因素。随着缓存大小的减少,碰撞率增加。在我们的例子中,将缓存大小从 50,000 行减少到 10,000 行(并保持与第一个例子相同)会导致每小时 72 次的碰撞率,显著高于 50,000 行的情况:
Update rate (UR): 更新速率 (UR):
20 updates/second 20 更新/秒
Number of instances (N): 实例数量 (N):
5
Cache size (S): 缓存大小 (S):
10,000 rows (changed from 50,000 ) 10,000 行(从 50,000 更改)
Replication latency (RL): 复制延迟 (RL):
100 milliseconds 100 毫秒
Updates: 更新:
72,000 per hour 每小时 72,000
Collision rate: 碰撞率:
72.0 per hour 每小时 72.0
Percentage: 百分比:
0.1%0.1 \%
Update rate (UR): 20 updates/second
Number of instances (N): 5
Cache size (S): 10,000 rows (changed from 50,000 )
Replication latency (RL): 100 milliseconds
Updates: 72,000 per hour
Collision rate: 72.0 per hour
Percentage: 0.1%| Update rate (UR): | 20 updates/second |
| :--- | :--- |
| Number of instances (N): | 5 |
| Cache size (S): | 10,000 rows (changed from 50,000 ) |
| Replication latency (RL): | 100 milliseconds |
| Updates: | 72,000 per hour |
| Collision rate: | 72.0 per hour |
| Percentage: | $0.1 \%$ |
Under normal circumstances, most systems do not have consistent update rates over such a long period of time. As such, when using this calculation it is helpful to understand the maximum update rate during peak usage and calculate minimum, normal, and peak collision rates. 在正常情况下,大多数系统在如此长的时间内并没有一致的更新速率。因此,在使用此计算时,了解高峰使用期间的最大更新速率并计算最小、正常和峰值碰撞率是很有帮助的。
Cloud Versus On-Premises Implementations 云与本地实施
Space-based architecture offers some unique options when it comes to the environments in which it is deployed. The entire topology, including the processing units, virtualized middleware, data pumps, data readers and writers, and the database, can be deployed within cloud-based environments on-premises (“on-prem”). However, this architecture style can also be deployed between these environments, offering a unique feature not found in other architecture styles. 基于空间的架构在其部署的环境方面提供了一些独特的选项。整个拓扑结构,包括处理单元、虚拟化中间件、数据泵、数据读取器和写入器,以及数据库,可以在本地的云环境中部署(“on-prem”)。然而,这种架构风格也可以在这些环境之间部署,提供了其他架构风格所没有的独特特性。
A powerful feature of this architecture style (as illustrated in Figure 15-11) is to deploy applications via processing units and virtualized middleware in managed cloud-based environments while keeping the physical databases and corresponding data on-prem. This topology supports very effective cloud-based data synchronization due to the asynchronous data pumps and eventual consistency model of this architecture style. Transactional processing can occur on dynamic and elastic cloudbased environments while preserving physical data management, reporting, and data analytics within secure and local on-prem environments. 这种架构风格的一个强大特性(如图 15-11 所示)是通过处理单元和虚拟化中间件在托管的云环境中部署应用程序,同时将物理数据库和相应的数据保留在本地。这种拓扑结构由于异步数据泵和最终一致性模型,支持非常有效的基于云的数据同步。事务处理可以在动态和弹性的基于云的环境中进行,同时在安全和本地的本地环境中保留物理数据管理、报告和数据分析。
Figure 15-11. Hybrid cloud-based and on-prem topology 图 15-11. 混合云和本地拓扑
Replicated Versus Distributed Caching 复制缓存与分布式缓存
Space-based architecture relies on caching for the transactional processing of an application. Removing the need for direct reads and writes to a database is how space-based architecture is able to support high scalability, high elasticity, and high performance. Space-based architecture mostly relies on replicated caching, although distributed caching can be used as well. 基于空间的架构依赖于缓存来进行应用程序的事务处理。消除对数据库的直接读写需求是基于空间的架构能够支持高可扩展性、高弹性和高性能的原因。基于空间的架构主要依赖于复制缓存,尽管也可以使用分布式缓存。
With replicated caching, as illustrated in Figure 15-12, each processing unit contains its own in-memory data grid that is synchronized between all processing units using that same named cache. When an update occurs to a cache within any of the process- 通过复制缓存,如图 15-12 所示,每个处理单元都包含其自己的内存数据网格,该网格在所有处理单元之间使用相同的命名缓存进行同步。当任何处理单元内的缓存发生更新时,
ing units, the other processing units are automatically updated with the new information. 处理单元,其他处理单元会自动更新新信息。
Figure 15-12. Replicated caching between processing units 图 15-12. 处理单元之间的复制缓存
Replicated caching is not only extremely fast, but it also supports high levels of fault tolerance. Since there is no central server holding the cache, replicated caching does not have a single point of failure. There may be exceptions to this rule, however, based on the implementation of the caching product used. Some caching products require the presence of an external controller to monitor and control the replication of data between processing units, but most product companies are moving away from this model. 复制缓存不仅极其快速,而且还支持高水平的容错能力。由于没有中央服务器持有缓存,复制缓存没有单点故障。然而,根据所使用的缓存产品的实现,可能会有例外。一些缓存产品需要外部控制器的存在,以监控和控制处理单元之间数据的复制,但大多数产品公司正在逐渐摆脱这种模型。
While replicated caching is the standard caching model for space-based architecture, there are some cases where it is not possible to use replicated caching. These situations include high data volumes (size of the cache) and high update rates to the cache data. Internal memory caches in excess of 100 MB might start to cause issues with regard to elasticity and high scalability due to the amount of memory used by each processing unit. Processing units are generally deployed within a virtual machine (or in some cases represent the virtual machine). Each virtual machine only has a certain amount of memory available for internal cache usage, limiting the number of processing unit instances that can be started to process high-throughput situations. Furthermore, as shown in “Data Collisions” on page 224, if the update rate of the cache data is too high, the data grid might be unable to keep up with that high update rate to ensure data consistency across all processing unit instances. When these situations occur, distributed caching can be used. 虽然复制缓存是基于空间的架构的标准缓存模型,但在某些情况下无法使用复制缓存。这些情况包括高数据量(缓存的大小)和高更新速率对缓存数据的影响。超过 100 MB 的内部内存缓存可能会由于每个处理单元使用的内存量而开始导致弹性和高可扩展性方面的问题。处理单元通常部署在虚拟机内(在某些情况下代表虚拟机)。每个虚拟机仅有一定量的内存可用于内部缓存使用,这限制了可以启动的处理单元实例的数量,以处理高吞吐量的情况。此外,如第 224 页的“数据冲突”所示,如果缓存数据的更新速率过高,数据网格可能无法跟上该高更新速率,以确保所有处理单元实例之间的数据一致性。当这些情况发生时,可以使用分布式缓存。
Distributed caching, as illustrated in Figure 15-13, requires an external server or service dedicated to holding a centralized cache. In this model the processing units do not store data in internal memory, but rather use a proprietary protocol to access the data from the central cache server. Distributed caching supports high levels of data 分布式缓存,如图 15-13 所示,需要一个专门的外部服务器或服务来持有集中式缓存。在这种模型中,处理单元不在内部内存中存储数据,而是使用专有协议从中央缓存服务器访问数据。分布式缓存支持高水平的数据
consistency because the data is all in one place and does not need to be replicated. However, this model has less performance than replicated caching because the cache data must be accessed remotely, adding to the overall latency of the system. Fault tolerance is also an issue with distributed caching. If the cache server containing the data goes down, no data can be accessed or updated from any of the processing units, rendering them nonoperational. Fault tolerance can be mitigated by mirroring the distributed cache, but this could present consistency issues if the primary cache server goes down unexpectedly and the data does not make it to the mirrored cache server. 一致性因为数据都在一个地方,不需要复制。然而,这种模型的性能低于复制缓存,因为缓存数据必须远程访问,增加了系统的整体延迟。故障容错在分布式缓存中也是一个问题。如果包含数据的缓存服务器出现故障,则无法从任何处理单元访问或更新数据,使它们无法操作。通过镜像分布式缓存可以减轻故障容错问题,但如果主缓存服务器意外故障且数据未能传输到镜像缓存服务器,则可能会出现一致性问题。
Figure 15-13. Distributed caching between processing units 图 15-13. 处理单元之间的分布式缓存
When the size of the cache is relatively small (under 100 MB ) and the update rate of the cache is low enough that the replication engine of the caching product can keep up with the cache updates, the decision between using a replicated cache and a distributed cache becomes one of data consistency versus performance and fault tolerance. A distributed cache will always offer better data consistency over a replicated cache because the cache of data is in a single place (as opposed to being spread across multiple processing units). However, performance and fault tolerance will always be better when using a replicated cache. Many times this decision comes down to the type of data being cached in the processing units. The need for highly consistent data (such as inventory counts of the available products) usually warrants a distributed cache, whereas data that does not change often (such as reference data like name/ value pairs, product codes, and product descriptions) usually warrants a replicated cache for quick lookup. Some of the selection criteria that can be used as a guide for choosing when to use a distributed cache versus a replicated cache are listed in Table 15-1. 当缓存的大小相对较小(低于 100 MB)且缓存的更新速率足够低,以至于缓存产品的复制引擎能够跟上缓存更新时,使用复制缓存和分布式缓存之间的决策变成了数据一致性与性能和容错之间的权衡。分布式缓存在数据一致性方面总是优于复制缓存,因为数据缓存位于一个地方(而不是分散在多个处理单元中)。然而,使用复制缓存时性能和容错能力总是更好。很多时候,这个决策取决于在处理单元中缓存的数据类型。对高度一致数据的需求(例如可用产品的库存计数)通常需要使用分布式缓存,而不经常变化的数据(例如参考数据,如名称/值对、产品代码和产品描述)通常需要使用复制缓存以便快速查找。选择何时使用分布式缓存与复制缓存的一些选择标准列在表 15-1 中。
Table 15-1. Distributed versus replicated caching 表 15-1. 分布式缓存与复制缓存
Decision criteria 决策标准
Replicated cache 复制缓存
Distributed cache 分布式缓存
Optimization 优化
Performance 性能
Consistency 一致性
Cache size 缓存大小
Small (<100 MB) 小(<100 MB)
Large ( > 500MB)>500 \mathrm{MB}) 大型 ( > 500MB)>500 \mathrm{MB})
Type of data 数据类型
Relatively static 相对静态
Highly dynamic 高度动态
Update frequency 更新频率
Relatively low 相对较低
High update rate 高更新率
Fault tolerance 容错
High 高
Low 低
Decision criteria Replicated cache Distributed cache
Optimization Performance Consistency
Cache size Small (<100 MB) Large ( > 500MB)
Type of data Relatively static Highly dynamic
Update frequency Relatively low High update rate
Fault tolerance High Low| Decision criteria | Replicated cache | Distributed cache |
| :--- | :--- | :--- |
| Optimization | Performance | Consistency |
| Cache size | Small (<100 MB) | Large ( $>500 \mathrm{MB})$ |
| Type of data | Relatively static | Highly dynamic |
| Update frequency | Relatively low | High update rate |
| Fault tolerance | High | Low |
When choosing the type of caching model to use with space-based architecture, remember that in most cases both models will be applicable within any given application context. In other words, neither replicated caching nor distributed caching solve every problem. Rather than trying to seek compromises through a single consistent caching model across the application, leverage each for its strengths. For example, for a processing unit that maintains the current inventory, choose a distributed caching model for data consistency; for a processing unit that maintains the customer profile, choose a replicated cache for performance and fault tolerance. 在选择与基于空间的架构一起使用的缓存模型类型时,请记住,在大多数情况下,两种模型都适用于任何给定的应用上下文。换句话说,既不是复制缓存也不是分布式缓存可以解决所有问题。与其试图通过在整个应用程序中使用单一一致的缓存模型来寻求折中,不如利用每种模型的优势。例如,对于维护当前库存的处理单元,选择分布式缓存模型以确保数据一致性;对于维护客户档案的处理单元,选择复制缓存以提高性能和容错能力。
Near-Cache Considerations 近缓存考虑事项
A near-cache is a type of caching hybrid model bridging in-memory data grids with a distributed cache. In this model (illustrated in Figure 15-14) the distributed cache is referred to as the full backing cache, and each in-memory data grid contained within each processing unit is referred to as the front cache. The front cache always contains a smaller subset of the full backing cache, and it leverages an eviction policy to remove older items so that newer ones can be added. The front cache can be what is known as a most recently used (MRU) cache containing the most recently used items or a most frequently used (MFU) cache containing the most frequently used items. Alternatively, a random replacement eviction policy can be used in the front cache so that items are removed in a random manner when space is needed to add a new item. Random replacement (RR) is a good eviction policy when there is no clear analysis of the data with regard to keeping either the latest used versus the most frequently used. 近缓存是一种缓存混合模型,连接内存数据网格和分布式缓存。在该模型中(如图 15-14 所示),分布式缓存被称为完整后备缓存,而每个处理单元内的内存数据网格被称为前缓存。前缓存始终包含完整后备缓存的一个较小子集,并利用驱逐策略移除较旧的项目,以便可以添加较新的项目。前缓存可以是所谓的最近最常用(MRU)缓存,包含最近使用的项目,或最常用(MFU)缓存,包含最常使用的项目。或者,前缓存中可以使用随机替换驱逐策略,以便在需要空间添加新项目时以随机方式移除项目。当没有明确的数据分析来判断是保留最新使用的项目还是最常使用的项目时,随机替换(RR)是一种良好的驱逐策略。
Figure 15-14. Near-cache topology 图 15-14. 近缓存拓扑
While the front caches are always kept in sync with the full backing cache, the front caches contained within each processing unit are not synchronized between other processing units sharing the same data. This means that multiple processing units sharing the same data context (such as a customer profile) will likely all have different data in their front cache. This creates inconsistencies in performance and responsiveness between processing units because each processing unit contains different data in the front cache. For this reason we do not recommended using a near-cache model for space-based architecture. 虽然前端缓存始终与完整的后备缓存保持同步,但每个处理单元内的前端缓存并未在共享相同数据的其他处理单元之间同步。这意味着多个共享相同数据上下文(例如客户档案)的处理单元可能在其前端缓存中拥有不同的数据。这导致处理单元之间在性能和响应性方面的不一致,因为每个处理单元在前端缓存中包含不同的数据。因此,我们不建议在基于空间的架构中使用近缓存模型。
Implementation Examples 实现示例
Space-based architecture is well suited for applications that experience high spikes in user or request volume and applications that have throughput in excess of 10,000 concurrent users. Examples of space-based architecture include applications like online concert ticketing systems and online auction systems. Both of these examples require high performance, high scalability, and high levels of elasticity. 基于空间的架构非常适合用户或请求量出现高峰的应用程序,以及并发用户超过 10,000 的应用程序。基于空间的架构的例子包括在线音乐会票务系统和在线拍卖系统。这两个例子都需要高性能、高可扩展性和高弹性。
Concert Ticketing System 音乐会票务系统
Concert ticketing systems have a unique problem domain in that concurrent user volume is relatively low until a popular concert is announced. Once concert tickets go on sale, user volumes usually spike from several hundred concurrent users to several thousand (possibly in the tens of thousands, depending on the concert), all trying to acquire a ticket for the concert (hopefully, good seats!). Tickets usually sell out in a matter of minutes, requiring the kind of architecture characteristics supported by space-based architecture. 音乐会票务系统有一个独特的问题领域,即在宣布热门音乐会之前,同时在线用户的数量相对较低。一旦音乐会门票开始销售,用户数量通常会从几百个并发用户激增到几千个(可能达到数万个,具体取决于音乐会),所有人都在试图获取音乐会的门票(希望能买到好座位!)。门票通常在几分钟内售罄,这需要空间基础架构所支持的那种架构特性。
There are many challenges associated with this sort of system. First, there are only a certain number of tickets available, regardless of the seating preferences. Seating availability must continually be updated and made available as fast as possible given the high number of concurrent requests. Also, assuming assigned seats are an option, seating availability must also be updated as fast as possible. Continually accessing a central database synchronously for this sort of system would likely not work-it would be very difficult for a typical database to handle tens of thousands of concurrent requests through standard database transactions at this level of scale and update frequency. 这种系统面临许多挑战。首先,无论座位偏好如何,只有一定数量的票可用。座位的可用性必须不断更新,并尽可能快速地提供,因为并发请求的数量很高。此外,假设指定座位是一个选项,座位的可用性也必须尽可能快速地更新。对于这种系统,持续同步访问中央数据库可能行不通——对于典型数据库来说,以这种规模和更新频率处理成千上万的并发请求通过标准数据库事务将非常困难。
Space-based architecture would be a good fit for a concert ticketing system due to the high elasticity requirements required of this type of application. An instantaneous increase in the number of concurrent users wanting to purchase concert tickets would be immediately recognized by the deployment manager, which in turn would start up a large number of processing units to handle the large volume of requests. Optimally, the deployment manager would be configured to start up the necessary number of processing units shortly before the tickets went on sale, therefore having those instances on standby right before the significant increase in user load. 基于空间的架构非常适合音乐会票务系统,因为这种类型的应用程序对弹性要求很高。希望购买音乐会门票的并发用户数量的瞬时增加会被部署管理器立即识别,进而启动大量处理单元以处理大量请求。理想情况下,部署管理器会被配置在门票开售前不久启动必要数量的处理单元,因此在用户负载显著增加之前,这些实例会处于待命状态。
Online Auction System 在线拍卖系统
Online auction systems (bidding on items within an auction) share the same sort of characteristics as the online concert ticketing systems described previously-both require high levels of performance and elasticity, and both have unpredictable spikes in user and request load. When an auction starts, there is no way of determining how many people will be joining the auction, and of those people, how many concurrent bids will occur for each asking price. 在线拍卖系统(对拍卖中的物品进行竞标)与之前描述的在线音乐会票务系统具有相似的特征——两者都需要高水平的性能和弹性,并且都存在用户和请求负载的不可预测峰值。当拍卖开始时,无法确定将有多少人参与拍卖,以及在这些人中,将有多少个并发竞标会发生在每个要价上。
Space-based architecture is well suited for this type of problem domain in that multiple processing units can be started as the load increases; and as the auction winds down, unused processing units could be destroyed. Individual processing units can be devoted to each auction, ensuring consistency with bidding data. Also, due to the asynchronous nature of the data pumps, bidding data can be sent to other processing (such as bid history, bid analytics, and auditing) without much latency, therefore increasing the overall performance of the bidding process. 基于空间的架构非常适合这种问题领域,因为随着负载的增加,可以启动多个处理单元;而随着拍卖的结束,未使用的处理单元可以被销毁。每个拍卖可以专门分配一个处理单元,以确保与竞标数据的一致性。此外,由于数据泵的异步特性,竞标数据可以在几乎没有延迟的情况下发送到其他处理(如竞标历史、竞标分析和审计),从而提高竞标过程的整体性能。
Architecture Characteristics Ratings 架构特征评级
A one-star rating in the characteristics ratings table in Figure 15-15 means the spe- cific architecture characteristic isn't well supported in the architecture,whereas a five- star rating means the architecture characteristic is one of the strongest features in the architecture style.The definition for each characteristic identified in the scorecard can be found in Chapter 4. 在图 15-15 的特征评分表中,一星评级意味着特定的架构特征在架构中支持不佳,而五星评级则意味着该架构特征是架构风格中最强的特征之一。评分卡中识别的每个特征的定义可以在第 4 章找到。
Architecture characteristic 架构特征
Star rating 星级评分
Partitioning type 分区类型
Domain and technical 领域和技术
Number of quanta 量子数
1 to many 一对多
Deployability 可部署性
Elasticity 弹性
Evolutionary 演化的
誰気
Fault tolerance 容错
佗気令 佗气令
Modularity 模块化
気気盛 气气盛
Overall cost 总体成本
凧盛
Performance 性能
Reliability 可靠性
気気気 气气气
Scalability 可扩展性
気気気気 气气气气
Simplicity 简单性
令
Testability 可测试性
\%
Architecture characteristic Star rating
Partitioning type Domain and technical
Number of quanta 1 to many
Deployability https://cdn.mathpix.com/cropped/2025_02_11_560afc03665d2bf0aa03g-253.jpg?height=68&width=204&top_left_y=761&top_left_x=934
Elasticity https://cdn.mathpix.com/cropped/2025_02_11_560afc03665d2bf0aa03g-253.jpg?height=74&width=320&top_left_y=835&top_left_x=934
Evolutionary 誰気
Fault tolerance 佗気令
Modularity 気気盛
Overall cost 凧盛
Performance https://cdn.mathpix.com/cropped/2025_02_11_560afc03665d2bf0aa03g-253.jpg?height=81&width=320&top_left_y=1207&top_left_x=934
Reliability 気気気
Scalability 気気気気
Simplicity 令
Testability \%| Architecture characteristic | Star rating |
| :---: | :---: |
| Partitioning type | Domain and technical |
| Number of quanta | 1 to many |
| Deployability |  |
| Elasticity |  |
| Evolutionary | 誰気 |
| Fault tolerance | 佗気令 |
| Modularity | 気気盛 |
| Overall cost | 凧盛 |
| Performance |  |
| Reliability | 気気気 |
| Scalability | 気気気気 |
| Simplicity | 令 |
| Testability | \% |
Figure 15-15.Space-based architecture characteristics ratings 图 15-15.基于空间的架构特性评级
Notice that space-based architecture maximizes elasticity,scalability,and perfor- mance(all five-star ratings).These are the driving attributes and main advantages of this architecture style.High levels of all three of these architecture characteristics are achieved by leveraging in-memory data caching and removing the database as a con- straint.As a result,processing millions of concurrent users is possible using this architecture style. 请注意,基于空间的架构最大化了弹性、可扩展性和性能(所有五颗星评级)。这些是这种架构风格的驱动属性和主要优势。通过利用内存数据缓存并消除数据库作为约束,可以实现这三种架构特征的高水平。因此,使用这种架构风格可以处理数百万的并发用户
While high levels of elasticity, scalability, and performance are advantages in this architecture style, there is a trade-off for this advantage, specifically with regard to overall simplicity and testability. Space-based architecture is a very complicated architecture style due to the use of caching and eventual consistency of the primary data store, which is the ultimate system of record. Care must be taken to ensure no data is lost in the event of a crash in any of the numerous moving parts of this architecture style (see “Preventing Data Loss” on page 201 in Chapter 14). 虽然高水平的弹性、可扩展性和性能是这种架构风格的优势,但这种优势是有代价的,特别是在整体简单性和可测试性方面。基于空间的架构是一种非常复杂的架构风格,因为它使用了缓存和主数据存储的最终一致性,而主数据存储是最终的记录系统。必须小心确保在这种架构风格的众多活动部分发生崩溃时不会丢失任何数据(请参见第 14 章第 201 页的“防止数据丢失”)。
Testing gets a one-star rating due to the complexity involved with simulating the high levels of scalability and elasticity supported in this architecture style. Testing hundreds of thousands of concurrent users at peak load is a very complicated and expensive task, and as a result most high-volume testing occurs within production environments with actual extreme load. This produces significant risk for normal operations within a production environment. 由于模拟这种架构风格所支持的高水平可扩展性和弹性所涉及的复杂性,测试获得了一星评级。在峰值负载下测试数十万的并发用户是一项非常复杂且昂贵的任务,因此大多数高容量测试发生在实际极端负载的生产环境中。这对生产环境中的正常操作产生了重大风险。
Cost is another factor when choosing this architecture style. Space-based architecture is relatively expensive, mostly due to licensing fees for caching products and high resource utilization within cloud and on-prem systems due to high scalability and elasticity. 成本是选择这种架构风格时的另一个因素。基于空间的架构相对昂贵,主要是由于缓存产品的许可费用以及由于高可扩展性和弹性而导致的云和本地系统中的高资源利用率。
It is difficult to identify the partitioning type of space-based architecture, and as a result we have identified it as both domain partitioned as well as technically partitioned. Space-based architecture is domain partitioned not only because it aligns itself with a specific type of domain (highly elastic and scalable systems), but also because of the flexibility of the processing units. Processing units can act as domain services in the same way services are defined in a service-based architecture or microservices architecture. At the same time, space-based architecture is technically partitioned in the way it separates the concerns about transactional processing using caching from the actual storage of the data in the database via data pumps. The processing units, data pumps, data readers and writers, and the database all form a technical layering in terms of how requests are processed, very similar with regard to how a monolithic n-tiered layered architecture is structured. 很难确定基于空间的架构的分区类型,因此我们将其识别为域分区和技术分区。基于空间的架构之所以是域分区,不仅因为它与特定类型的域(高度弹性和可扩展的系统)对齐,还因为处理单元的灵活性。处理单元可以像在基于服务的架构或微服务架构中定义的服务一样,充当域服务。同时,基于空间的架构在技术上是分区的,因为它通过数据泵将事务处理使用缓存的关注点与数据库中数据的实际存储分开。处理单元、数据泵、数据读取器和写入器以及数据库在请求处理的方式上形成了技术层次结构,这与单体的 n 层分层架构的结构非常相似。
The number of quanta within space-based architecture can vary based on how the user interface is designed and how communication happens between processing units. Because the processing units do not communicate synchronously with the database, the database itself is not part of the quantum equation. As a result, quanta within a space-based architecture are typically delineated through the association between the various user interfaces and the processing units. Processing units that synchronously communicate with each other (or synchronously through the processing grid for orchestration) would all be part of the same architectural quantum. 基于空间的架构中的量子数量可以根据用户界面的设计和处理单元之间的通信方式而有所不同。由于处理单元与数据库之间的通信不是同步的,因此数据库本身不属于量子方程的一部分。因此,基于空间的架构中的量子通常通过各种用户界面与处理单元之间的关联来划分。同步通信的处理单元(或通过处理网格进行同步以进行编排)将全部属于同一个架构量子。
Architecture styles, like art movements, must be understood in the context of the era in which they evolved, and this architecture exemplifies this rule more than any other. The combination of external forces that often influence architecture decisions, combined with a logical but ultimately disastrous organizational philosophy, doomed this architecture to irrelevance. However, it provides a great example of how a particular organizational idea can make logical sense yet hinder most important parts of the development process. 架构风格,像艺术运动一样,必须在其演变的时代背景下理解,而这种架构更是体现了这一规则。外部力量的结合常常影响架构决策,加上一个逻辑上但最终灾难性的组织哲学,使得这种架构注定要被遗忘。然而,它提供了一个很好的例子,说明一个特定的组织理念如何在逻辑上是合理的,却阻碍了开发过程中的最重要部分。
History and Philosophy 历史与哲学
This style of service-oriented architecture appeared just as companies were becoming enterprises in the late 1990s: merging with smaller companies, growing at a breakneck pace, and requiring more sophisticated IT to accommodate this growth. However, computing resources were scarce, precious, and commercial. Distributed computing had just become possible and necessary, and many companies needed the variable scalability and other beneficial characteristics. 这种服务导向架构的风格出现在 1990 年代末,当时公司正逐渐成为企业:与小公司合并,以惊人的速度增长,并需要更复杂的 IT 来适应这种增长。然而,计算资源稀缺、珍贵且商业化。分布式计算刚刚变得可能和必要,许多公司需要可变的可扩展性和其他有利特性。
Many external drivers forced architects in this era toward distributed architectures with significant constraints. Before open source operating systems were thought reliable enough for serious work, operating systems were expensive and licensed per machine. Similarly, commercial database servers came with Byzantine licensing schemes, which caused application server vendors (which offered database connection pooling) to battle with database vendors. Thus, architects were expected to reuse as much as possible. In fact, reuse in all forms became the dominant philosophy in this architecture, the side effects of which we cover in “Reuse…and Coupling” on page 239. 许多外部驱动因素迫使这一时代的架构师朝着具有重大约束的分布式架构发展。在开源操作系统被认为足够可靠以进行严肃工作之前,操作系统的费用昂贵,并且按机器授权。同样,商业数据库服务器也伴随着复杂的许可方案,这导致提供数据库连接池的应用服务器供应商与数据库供应商之间的斗争。因此,架构师被期望尽可能多地重用。事实上,各种形式的重用成为了这种架构的主导理念,其副作用我们在第 239 页的“重用……和耦合”中进行了讨论。
This style of architecture also exemplifies how far architects can push the idea of technical partitioning, which had good motivations but bad consequences. 这种架构风格也体现了架构师在技术分区理念上可以推得多远,这个理念有良好的动机但却带来了不良后果。
Topology 拓扑
The topology of this type of service-oriented architecture is shown in Figure 16-1. 这种服务导向架构的拓扑如图 16-1 所示。
Figure 16-1. Topology of orchestration-driven service-oriented architecture 图 16-1. 以编排驱动的服务导向架构的拓扑
Not all examples of this style of architecture had the exact layers illustrated in Figure 16-1, but they all followed the same idea of establishing a taxonomy of services within the architecture, each layer with a specific responsibility. 并非所有这种架构风格的示例都有图 16-1 中所示的确切层次,但它们都遵循在架构中建立服务分类法的相同理念,每一层都有特定的责任。
Service-oriented architecture is a distributed architecture; the exact demarcation of boundaries isn’t shown in Figure 16-1 because it varied based on organization. 面向服务的架构是一种分布式架构;图 16-1 中没有显示边界的确切划分,因为它根据组织而有所不同。
Taxonomy 分类法
The architect’s driving philosophy in this architecture centered around enterpriselevel reuse. Many large companies were annoyed at how much they had to continue to rewrite software, and they struck on a strategy to gradually solve that problem. Each layer of the taxonomy supported this goal. 该架构师在此架构中的核心理念围绕企业级重用。许多大型公司对他们必须不断重写软件感到恼火,因此他们制定了一项逐步解决该问题的策略。分类法的每一层都支持这一目标。
Business Services 业务服务
Business services sit at the top of this architecture and provide the entry point. For example, services like ExecuteTrade or PlaceOrder represent domain behavior. One litmus test common at the time-could an architect answer affirmatively to the question “Are we in the business of…” for each of these services? 业务服务位于该架构的顶部,并提供入口点。例如,ExecuteTrade 或 PlaceOrder 等服务代表领域行为。一个当时常见的试金石是:架构师能否对“我们是否在从事……的业务?”这个问题对每个服务都回答肯定?
These service definitions contained no code-just input, output, and sometimes schema information. They were usually defined by business users, hence the name business services. 这些服务定义不包含代码,仅包含输入、输出,有时还包括模式信息。它们通常由业务用户定义,因此被称为业务服务。
Enterprise Services 企业服务
The enterprise services contain fine-grained, shared implementations. Typically, a team of developers is tasked with building atomic behavior around particular business domains: CreateCustomer, CalculateQuote, and so on. These services are the building blocks that make up the coarse-grained business services, tied together via the orchestration engine. 企业服务包含细粒度的共享实现。通常,一组开发人员负责围绕特定业务领域构建原子行为:CreateCustomer、CalculateQuote 等。这些服务是构成粗粒度业务服务的构件,通过编排引擎将它们连接在一起。
This separation of responsibility flows from the reuse goal in this architecture. If developers can build fine-grained enterprise services at just the correct level of granularity, the business won’t have to rewrite that part of the business workflow again. Gradually, the business will build up a collection of reusable assets in the form of reusable enterprise services. 这种责任的分离源于该架构中的重用目标。如果开发人员能够在恰当的粒度水平上构建细粒度的企业服务,业务就不必再次重写业务工作流的那部分。逐渐地,业务将积累一系列可重用的资产,形式为可重用的企业服务。
Unfortunately, the dynamic nature of reality defies these attempts. Business components aren’t like construction materials, where solutions last decades. Markets, technology changes, engineering practices, and a host of other factors confound attempts to impose stability on the software world. 不幸的是,现实的动态特性使这些尝试变得无效。商业组件不像建筑材料那样,解决方案可以持续数十年。市场、技术变化、工程实践以及其他许多因素使得在软件世界中强加稳定性变得复杂。
Application Services 应用服务
Not all services in the architecture require the same level of granularity or reuse as the enterprise services. Application services are one-off, single-implementation services. For example, perhaps one application needs geo-location, but the organization doesn’t want to take the time or effort to make that a reusable service. An application service, typically owned by a single application team, solves these problems. 并非架构中的所有服务都需要与企业服务相同的粒度或重用级别。应用服务是一次性、单一实现的服务。例如,某个应用可能需要地理定位,但组织不想花时间或精力将其制作成可重用的服务。应用服务通常由单个应用团队拥有,解决了这些问题。
Infrastructure Services 基础设施服务
Infrastructure services supply the operational concerns, such as monitoring, logging, authentication, and authorization. These services tend to be concrete implementations, owned by a shared infrastructure team that works closely with operations. 基础设施服务提供运营相关的关注点,如监控、日志记录、身份验证和授权。这些服务往往是具体的实现,由一个与运营紧密合作的共享基础设施团队负责。
Orchestration Engine 编排引擎
The orchestration engine forms the heart of this distributed architecture, stitching together the business service implementations using orchestration, including features like transactional coordination and message transformation. This architecture is typically tied to a single relational database, or a few, rather than a database per service as in microservices architectures. Thus, transactional behavior is handled declaratively in the orchestration engine rather than in the database. 编排引擎构成了这种分布式架构的核心,通过编排将业务服务实现连接在一起,包括事务协调和消息转换等功能。该架构通常与单个关系数据库或少数几个数据库相关联,而不是像微服务架构那样每个服务都有一个数据库。因此,事务行为是在编排引擎中以声明方式处理,而不是在数据库中处理。
The orchestration engine defines the relationship between the business and enterprise services, how they map together, and where transaction boundaries lie. It also acts as an integration hub, allowing architects to integrate custom code with package and legacy software systems. 编排引擎定义了业务和企业服务之间的关系,它们如何映射在一起,以及事务边界在哪里。它还充当集成中心,允许架构师将自定义代码与软件包和遗留软件系统集成。
Because this mechanism forms the heart of the architecture, Conway’s law (see “Conway’s Law” on page 103) correctly predicts that the team of integration architects responsible for this engine become a political force within an organization, and eventually a bureaucratic bottleneck. 因为这个机制构成了架构的核心,康威定律(见第 103 页的“康威定律”)正确地预测了负责这个引擎的集成架构师团队会成为组织内的政治力量,并最终成为官僚瓶颈。
While this approach might sound appealing, in practice it was mostly a disaster. Offloading transaction behavior to an orchestration tool sounded good, but finding the correct level of granularity of transactions became more and more difficult. While building a few services wrapped in a distributed transaction is possible, the architecture becomes increasingly complex as developers must figure out where the appropriate transaction boundaries lie between services. 虽然这种方法听起来很有吸引力,但在实践中大多数情况下都是灾难。将事务行为卸载到一个编排工具听起来不错,但找到事务的正确粒度变得越来越困难。虽然构建几个封装在分布式事务中的服务是可能的,但随着开发人员必须弄清楚服务之间适当的事务边界在哪里,架构变得越来越复杂。
Message Flow 消息流
All requests go through the orchestration engine-it is the location within this architecture where logic resides. Thus, message flow goes through the engine even for internal calls, as shown in Figure 16-2. 所有请求都通过协调引擎——这是该架构中逻辑所在的位置。因此,消息流即使对于内部调用也会经过引擎,如图 16-2 所示。
Figure 16-2. Message flow with service-oriented architecture 图 16-2. 服务导向架构中的消息流
In Figure 16-2, the CreateQuote business-level service calls the service bus, which defines the workflow that consists of calls to CreateCustomer and CalculateQuote, each of which also has calls to application services. The service bus acts as the intermediary for all calls within this architecture, serving as both an integration hub and orchestration engine. 在图 16-2 中,CreateQuote 业务级服务调用服务总线,该总线定义了由调用 CreateCustomer 和 CalculateQuote 组成的工作流,每个调用也有对应用服务的调用。服务总线充当该架构中所有调用的中介,既作为集成中心,又作为编排引擎。
Reuse...and Coupling 重用...和耦合
A major goal of this architecture is reuse at the service level-the ability to gradually build business behavior that can be incrementally reused over time. Architects in this architecture were instructed to find reuse opportunities as aggressively as possible. For example, consider the situation illustrated in Figure 16-3. 该架构的一个主要目标是在服务层实现重用——逐步构建可以随着时间逐渐重用的业务行为。该架构中的架构师被指示尽可能积极地寻找重用机会。例如,考虑图 16-3 所示的情况。
Figure 16-3. Seeking reuse opportunities in service-oriented architecture 图 16-3. 在面向服务的架构中寻求重用机会
In Figure 16-3, an architect realizes that each of these divisions within an insurance company all contain a notion of Customer. Therefore, the proper strategy for serviceoriented architecture entails extracting the customer parts into a reusable service and allowing the original services to reference the canonical Customer service, shown in Figure 16-4. 在图 16-3 中,架构师意识到保险公司内的每个部门都包含客户的概念。因此,面向服务的架构的正确策略是将客户部分提取到一个可重用的服务中,并允许原始服务引用规范的客户服务,如图 16-4 所示。
Figure 16-4. Building canonical representations in service-oriented architecture 图 16-4. 在面向服务的架构中构建规范表示
In Figure 16-4, the architect has isolated all customer behavior into a single Customer service, achieving obvious reuse goals. 在图 16-4 中,架构师将所有客户行为隔离到一个单一的客户服务中,实现了明显的重用目标。
However, architects only slowly realized the negative trade-offs of this design. First, when a team builds a system primarily around reuse, they also incur a huge amount of coupling between components. For example, in Figure 16-4, a change to the Customer service ripples out to all the other services, making change risky. Thus, in service-oriented architecture, architects struggled with making incremental changeeach change had a potential huge ripple effect. That in turn led to the need for coordinated deployments, holistic testing, and other drags on engineering efficiency. 然而,架构师们只是慢慢意识到这种设计的负面权衡。首先,当一个团队主要围绕重用构建系统时,他们也会在组件之间产生大量耦合。例如,在图 16-4 中,对 Customer 服务的更改会波及到所有其他服务,使得更改变得风险很大。因此,在面向服务的架构中,架构师们在进行增量更改时面临困难——每次更改都有可能产生巨大的涟漪效应。这反过来又导致了对协调部署、整体测试和其他工程效率拖累的需求。
Another negative side effect of consolidating behavior into a single place: consider the case of auto and disability insurance in Figure 16-4. To support a single Customer service, it must include all the details the organization knows about customers. Auto insurance requires a driver’s license, which is a property of the person, not the vehicle. Therefore, the Customer service will have to include details about driver’s licenses that the disability insurance division cares nothing about. Yet, the team that deals with disability must deal with the extra complexity of a single customer definition. 将行为整合到一个地方的另一个负面副作用:考虑图 16-4 中的汽车和残疾保险的情况。为了支持单一的客户服务,它必须包括组织所知道的关于客户的所有细节。汽车保险需要驾驶执照,这是个人的属性,而不是车辆的属性。因此,客户服务将不得不包括残疾保险部门根本不关心的驾驶执照的细节。然而,处理残疾的团队必须应对单一客户定义带来的额外复杂性。
Perhaps the most damaging revelation from this architecture came with the realization of the impractically of building an architecture so focused on technical partitioning. While it makes sense from a separation and reuse philosophy standpoint, it was a practical nightmare. Domain concepts like CatalogCheckout were spread so thinly throughout this architecture that they were virtually ground to dust. Developers commonly work on tasks like “add a new address line to CatalogCheckout.” In a serviceoriented architecture, that could entail dozens of services in several different tiers, plus changes to a single database schema. And, if the current enterprise services aren’t defined at the correct transactional granularity, the developers will either have to change their design or build a new, near-identical service to change transactional behavior. So much for reuse. 也许这个架构最具破坏性的揭示是意识到构建一个如此专注于技术分区的架构是不切实际的。虽然从分离和重用的哲学角度来看是合理的,但实际上却是一场噩梦。像 CatalogCheckout 这样的领域概念在这个架构中分散得如此稀薄,以至于几乎被磨成了尘埃。开发人员通常会处理“向 CatalogCheckout 添加新地址行”这样的任务。在面向服务的架构中,这可能涉及到数十个服务和几个不同的层次,以及对单个数据库模式的更改。而且,如果当前的企业服务没有在正确的事务粒度上定义,开发人员将不得不更改他们的设计或构建一个新的、几乎相同的服务来改变事务行为。重用算什么呢。
Architecture Characteristics Ratings 架构特征评级
Many of the modern criteria we use to evaluate architecture now were not priorities when this architecture was popular. In fact, the Agile software movement had just started and had not penetrated into the size of organizations likely to use this architecture. 我们现在用来评估架构的许多现代标准,在这种架构流行时并不是优先考虑的。事实上,敏捷软件运动刚刚开始,并未渗透到可能使用这种架构的组织规模中。
A one-star rating in the characteristics ratings table in Figure 16-5 means the specific architecture characteristic isn’t well supported in the architecture, whereas a five-star rating means the architecture characteristic is one of the strongest features in the architecture style. The definition for each characteristic identified in the scorecard can be found in Chapter 4. 在图 16-5 的特征评分表中,一星评级意味着特定的架构特征在架构中支持不佳,而五星评级则意味着该架构特征是架构风格中最强的特征之一。评分卡中识别的每个特征的定义可以在第 4 章找到。
Service-oriented architecture is perhaps the most technically partitioned general- purpose architecture ever attempted!In fact,the backlash against the disadvantages of this structure lead to more modern architectures such as microservices.It has a single quantum even though it is a distributed architecture for two reasons.First,it generally uses a single database or just a few databases,creating coupling points within the architecture across many different concerns.Second,and more impor- tantly,the orchestration engine acts as a giant coupling point-no part of the archi- tecture can have different architecture characteristics than the mediator that orchestrates all behavior.Thus,this architecture manages to find the disadvantages of both monolithic and distributed architectures. 服务导向架构可能是迄今为止尝试过的技术上最为分散的通用架构!事实上,对这种结构缺点的反弹导致了更现代的架构,如微服务。尽管它是一种分布式架构,但由于两个原因,它具有单一的量子。首先,它通常使用单一数据库或仅少数几个数据库,在架构内创建了许多不同关注点之间的耦合点。其次,更重要的是,编排引擎充当了一个巨大的耦合点——架构的任何部分都不能具有与协调所有行为的中介不同的架构特性。因此,这种架构设法找到了单体架构和分布式架构的缺点
Figure 16-5.Ratings for service-oriented architecture 图 16-5.面向服务架构的评分
Modern engineering goals such as deployability and testability score disastrously in this architecture,both because they were poorly supported and because those were not important(or even aspirational)goals during that era. 现代工程目标,如可部署性和可测试性,在这种架构中得分惨淡,既因为它们得到了很差的支持,也因为在那个时代这些目标并不重要(甚至不是理想目标)
This architecture did support some goals such as elasticity and scalability, despite the difficulties in implementing those behaviors, because tool vendors poured enormous effort into making these systems scalable by building session replication across application servers and other techniques. However, being a distributed architecture, performance was never a highlight of this architecture style and was extremely poor because each business request was split across so much of the architecture. 这种架构确实支持了一些目标,例如弹性和可扩展性,尽管在实现这些行为时存在困难,因为工具供应商投入了大量精力,通过在应用服务器之间构建会话复制和其他技术来使这些系统可扩展。然而,作为一种分布式架构,性能从来不是这种架构风格的亮点,表现极其糟糕,因为每个业务请求在架构中被分割得过于细碎。
Because of all these factors, simplicity and cost have the inverse relationship most architects would prefer. This architecture was an important milestone because it taught architects how difficult distributed transactions can be in the real world and the practical limits of technical partitioning. 由于所有这些因素,简单性和成本之间存在大多数架构师所希望的反向关系。这种架构是一个重要的里程碑,因为它教会了架构师在现实世界中分布式事务可能有多困难以及技术分区的实际限制。
CHAPTER 17 第 17 章
Microservices Architecture 微服务架构
Microservices is an extremely popular architecture style that has gained significant momentum in recent years. In this chapter, we provide an overview of the important characteristics that set this architecture apart, both topologically and philosophically. 微服务是一种极受欢迎的架构风格,近年来获得了显著的关注。在本章中,我们提供了使这种架构在拓扑和哲学上与众不同的重要特征的概述。
History 历史
Most architecture styles are named after the fact by architects who notice a particular pattern that keeps reappearing-there is no secret group of architects who decide what the next big movement will be. Rather, it turns out that many architects end up making common decisions as the software development ecosystem shifts and changes. The common best ways of dealing with and profiting from those shifts become architecture styles that others emulate. 大多数架构风格是由注意到某种特定模式不断重现的建筑师命名的——并没有一个秘密的建筑师团体来决定下一个重大潮流会是什么。相反,许多建筑师最终会在软件开发生态系统的变化中做出共同的决策。应对这些变化并从中获利的常见最佳方法成为其他人模仿的架构风格。
Microservices differs in this regard—it was named fairly early in its usage and popularized by a famous blog entry by Martin Fowler and James Lewis entitled “Microservices,” published in March 2014. They recognized many common characteristics in this relatively new architectural style and delineated them. Their blog post helped define the architecture for curious architects and helped them understand the underlying philosophy. 微服务在这方面有所不同——它在使用初期就被命名,并通过马丁·福勒和詹姆斯·刘易斯于 2014 年 3 月发布的著名博客文章《Microservices》而广为人知。他们识别出这种相对较新的架构风格中的许多共同特征并进行了划分。他们的博客文章帮助好奇的架构师定义了这种架构,并帮助他们理解其基本哲学。
Microservices is heavily inspired by the ideas in domain-driven design (DDD), a logical design process for software projects. One concept in particular from DDD, bounded context, decidedly inspired microservices. The concept of bounded context represents a decoupling style. When a developer defines a domain, that domain includes many entities and behaviors, identified in artifacts such as code and database schemas. For example, an application might have a domain called CatalogCheckout, which includes notions such as catalog items, customers, and payment. In a traditional monolithic architecture, developers would share many of these concepts, 微服务深受领域驱动设计(DDD)思想的启发,这是一种针对软件项目的逻辑设计过程。DDD 中的一个特别概念,限界上下文,明确地启发了微服务。限界上下文的概念代表了一种解耦风格。当开发者定义一个领域时,该领域包括许多实体和行为,这些实体和行为在代码和数据库模式等文档中被识别。例如,一个应用程序可能有一个名为 CatalogCheckout 的领域,其中包括目录项、客户和支付等概念。在传统的单体架构中,开发者会共享许多这些概念,
building reusable classes and linked databases. Within a bounded context, the internal parts, such as code and data schemas, are coupled together to produce work; but they are never coupled to anything outside the bounded context, such as a database or class definition from another bounded context. This allows each context to define only what it needs rather than accommodating other constituents. 构建可重用的类和链接数据库。在一个有界上下文内,内部部分,如代码和数据模式,被耦合在一起以产生工作;但它们从不与有界上下文外的任何东西耦合,例如来自另一个有界上下文的数据库或类定义。这允许每个上下文仅定义其所需的内容,而不是适应其他组成部分。
While reuse is beneficial, remember the First Law of Software Architecture regarding trade-offs. The negative trade-off of reuse is coupling. When an architect designs a system that favors reuse, they also favor coupling to achieve that reuse, either by inheritance or composition. 虽然重用是有益的,但请记住软件架构的第一法则关于权衡。重用的负面权衡是耦合。当架构师设计一个偏向重用的系统时,他们也会偏向耦合以实现这种重用,无论是通过继承还是组合。
However, if the architect’s goal requires high degrees of decoupling, then they favor duplication over reuse. The primary goal of microservices is high decoupling, physically modeling the logical notion of bounded context. 然而,如果架构师的目标需要高度解耦,那么他们更倾向于重复而不是重用。微服务的主要目标是高度解耦,物理建模有界上下文的逻辑概念。
Topology 拓扑
The topology of microservices is shown in Figure 17-1. 微服务的拓扑如图 17-1 所示。
Figure 17-1. The topology of the microservices architecture style 图 17-1. 微服务架构风格的拓扑结构
As illustrated in Figure 17-1, due to its single-purpose nature, the service size in microservices is much smaller than other distributed architectures, such as the 如图 17-1 所示,由于其单一目的的特性,微服务中的服务大小远小于其他分布式架构,例如
orchestration-driven service-oriented architecture. Architects expect each service to include all necessary parts to operate independently, including databases and other dependent components. The different characteristics appear in the following sections. 以编排驱动的服务导向架构。架构师期望每个服务都包含独立运行所需的所有部分,包括数据库和其他依赖组件。不同的特征将在以下部分中出现。
Distributed 分布式
Microservices form a distributed architecture: each service runs in its own process, which originally implied a physical computer but quickly evolved to virtual machines and containers. Decoupling the services to this degree allows for a simple solution to a common problem in architectures that heavily feature multitenant infrastructure for hosting applications. For example, when using an application server to manage multiple running applications, it allows operational reuse of network bandwidth, memory, disk space, and a host of other benefits. However, if all the supported applications continue to grow, eventually some resource becomes constrained on the shared infrastructure. Another problem concerns improper isolation between shared applications. 微服务形成了一种分布式架构:每个服务在其自己的进程中运行,这最初意味着一个物理计算机,但很快演变为虚拟机和容器。将服务解耦到这种程度为在重度使用多租户基础设施托管应用程序的架构中常见问题提供了简单的解决方案。例如,当使用应用程序服务器管理多个运行的应用程序时,它允许网络带宽、内存、磁盘空间和其他一系列好处的操作重用。然而,如果所有支持的应用程序继续增长,最终某些资源在共享基础设施上会变得受限。另一个问题涉及共享应用程序之间的不当隔离。
Separating each service into its own process solves all the problems brought on by sharing. Before the evolutionary development of freely available open source operating systems, combined with automated machine provisioning, it was impractical for each domain to have its own infrastructure. Now, however, with cloud resources and container technology, teams can reap the benefits of extreme decoupling, both at the domain and operational level. 将每个服务分离到自己的进程中解决了共享带来的所有问题。在自由可用的开源操作系统的逐步发展之前,加上自动化机器配置,每个领域拥有自己的基础设施是不切实际的。然而,现在,随着云资源和容器技术的出现,团队可以在领域和操作层面上获得极端解耦的好处。
Performance is often the negative side effect of the distributed nature of microservices. Network calls take much longer than method calls, and security verification at every endpoint adds additional processing time, requiring architects to think carefully about the implications of granularity when designing the system. 性能往往是微服务分布式特性的负面影响。网络调用比方法调用花费的时间要长得多,并且每个端点的安全验证增加了额外的处理时间,这要求架构师在设计系统时仔细考虑粒度的影响。
Because microservices is a distributed architecture, experienced architects advise against the use of transactions across service boundaries, making determining the granularity of services the key to success in this architecture. 由于微服务是一种分布式架构,经验丰富的架构师建议不要在服务边界跨越使用事务,因此确定服务的粒度是该架构成功的关键。
Bounded Context 限界上下文
The driving philosophy of microservices is the notion of bounded context: each service models a domain or workflow. Thus, each service includes everything necessary to operate within the application, including classes, other subcomponents, and database schemas. This philosophy drives many of the decisions architects make within this architecture. For example, in a monolith, it is common for developers to share common classes, such as Address, between disparate parts of the application. However, microservices try to avoid coupling, and thus an architect building this architecture style prefers duplication to coupling. 微服务的驱动哲学是有界上下文的概念:每个服务建模一个领域或工作流。因此,每个服务包括在应用程序中运行所需的所有内容,包括类、其他子组件和数据库模式。这种哲学推动了架构师在这种架构中做出的许多决策。例如,在单体应用中,开发人员通常会在应用程序的不同部分之间共享公共类,例如 Address。然而,微服务试图避免耦合,因此构建这种架构风格的架构师更倾向于重复而不是耦合。
Microservices take the concept of a domain-partitioned architecture to the extreme. Each service is meant to represent a domain or subdomain; in many ways, microservices is the physical embodiment of the logical concepts in domain-driven design. 微服务将领域分区架构的概念推向了极致。每个服务旨在代表一个领域或子领域;在许多方面,微服务是领域驱动设计中逻辑概念的物理体现。
Granularity 粒度
Architects struggle to find the correct granularity for services in microservices, and often make the mistake of making their services too small, which requires them to build communication links back between the services to do useful work. 架构师在微服务中努力寻找服务的正确粒度,常常犯将服务做得过小的错误,这需要他们在服务之间建立通信链接以完成有用的工作。
The term “microservice” is a label, not a description. “微服务”这个术语是一个标签,而不是一个描述。
-Martin Fowler -马丁·福勒
In other words, the originators of the term needed to call this new style something, and they chose “microservices” to contrast it with the dominant architecture style at the time, service-oriented architecture, which could have been called “gigantic services”. However, many developers take the term “microservices” as a commandment, not a description, and create services that are too fine-grained. 换句话说,创造这个术语的人需要给这种新风格起个名字,他们选择了“微服务”来与当时主导的架构风格——面向服务的架构进行对比,后者本可以被称为“巨型服务”。然而,许多开发者将“微服务”这个术语视为一种命令,而不是描述,从而创建了过于细粒度的服务。
The purpose of service boundaries in microservices is to capture a domain or workflow. In some applications, those natural boundaries might be large for some parts of the system-some business processes are more coupled than others. Here are some guidelines architects can use to help find the appropriate boundaries: 微服务中服务边界的目的是捕捉一个领域或工作流。在某些应用中,这些自然边界可能对系统的某些部分来说较大——某些业务流程比其他流程更紧密耦合。以下是架构师可以使用的一些指导原则,以帮助找到适当的边界:
Purpose 目的
The most obvious boundary relies on the inspiration for the architecture style, a domain. Ideally, each microservice should be extremely functionally cohesive, contributing one significant behavior on behalf of the overall application. 最明显的边界依赖于架构风格的灵感来源,即一个领域。理想情况下,每个微服务应该具有极强的功能内聚性,为整体应用贡献一个重要的行为。
Transactions 事务
Bounded contexts are business workflows, and often the entities that need to cooperate in a transaction show architects a good service boundary. Because transactions cause issues in distributed architectures, if architects can design their system to avoid them, they generate better designs. 有界上下文是业务工作流,通常需要在事务中协作的实体向架构师展示了良好的服务边界。由于事务在分布式架构中会引发问题,如果架构师能够设计他们的系统以避免这些问题,他们就能生成更好的设计。
Choreography 编排
If an architect builds a set of services that offer excellent domain isolation yet require extensive communication to function, the architect may consider bundling these services back into a larger service to avoid the communication overhead. 如果一个架构师构建了一组提供出色领域隔离的服务,但需要大量通信才能正常工作,架构师可能会考虑将这些服务重新打包成一个更大的服务,以避免通信开销。
Iteration is the only way to ensure good service design. Architects rarely discover the perfect granularity, data dependencies, and communication styles on their first pass. However, after iterating over the options, an architect has a good chance of refining their design. 迭代是确保良好服务设计的唯一方法。架构师很少在第一次尝试中发现完美的粒度、数据依赖关系和通信风格。然而,在对选项进行迭代之后,架构师有很大机会完善他们的设计。
Data Isolation 数据隔离
Another requirement of microservices, driven by the bounded context concept, is data isolation. Many other architecture styles use a single database for persistence. However, microservices tries to avoid all kinds of coupling, including shared schemas and databases used as integration points. 微服务的另一个要求是数据隔离,这一要求源于有界上下文的概念。许多其他架构风格使用单一数据库进行持久化。然而,微服务试图避免各种耦合,包括作为集成点的共享模式和数据库。
Data isolation is another factor an architect must consider when looking at service granularity. Architects must be wary of the entity trap (discussed in “Entity trap” on page 110) and not simply model their services to resemble single entities in a database. 数据隔离是架构师在考虑服务粒度时必须考虑的另一个因素。架构师必须警惕实体陷阱(在第 110 页的“实体陷阱”中讨论)并且不能仅仅将他们的服务建模为数据库中的单个实体。
Architects are accustomed to using relational databases to unify values within a system, creating a single source of truth, which is no longer an option when distributing data across the architecture. Thus, architects must decide how they want to handle this problem: either identifying one domain as the source of truth for some fact and coordinating with it to retrieve values or using database replication or caching to distribute information. 架构师习惯于使用关系数据库在系统内统一值,创建单一的真实来源,但在架构中分发数据时,这已不再是一个选项。因此,架构师必须决定如何处理这个问题:要么将一个领域识别为某个事实的真实来源,并与其协调以检索值,要么使用数据库复制或缓存来分发信息。
While this level of data isolation creates headaches, it also provides opportunities. Now that teams aren’t forced to unify around a single database, each service can choose the most appropriate tool, based on price, type of storage, or a host of other factors. Teams have the advantage in a highly decoupled system to change their mind and choose a more suitable database (or other dependency) without affecting other teams, which aren’t allowed to couple to implementation details. 虽然这种数据隔离的程度带来了麻烦,但也提供了机会。现在团队不再被迫统一使用单一数据库,每个服务可以根据价格、存储类型或其他许多因素选择最合适的工具。在高度解耦的系统中,团队有优势,可以改变主意,选择更合适的数据库(或其他依赖项),而不会影响其他团队,这些团队不允许与实现细节耦合。
API Layer API 层
Most pictures of microservices include an API layer sitting between the consumers of the system (either user interfaces or calls from other systems), but it is optional. It is common because it offers a good location within the architecture to perform useful tasks, either via indirection as a proxy or a tie into operational facilities, such as a naming service (covered in “Operational Reuse” on page 250). 大多数微服务的图片中包含一个 API 层,位于系统的消费者(无论是用户界面还是来自其他系统的调用)之间,但这是可选的。它之所以常见,是因为它在架构中提供了一个良好的位置来执行有用的任务,无论是通过作为代理的间接方式,还是与操作设施的连接,例如命名服务(在第 250 页的“操作重用”中讨论)。
While an API layer may be used for variety of things, it should not be used as a mediator or orchestration tool if the architect wants to stay true to the underlying philosophy of this architecture: all interesting logic in this architecture should occur inside a bounded context, and putting orchestration or other logic in a mediator violates that rule. This also illustrates the difference between technical and domain partitioning in architecture: architects typically use mediators in technically partitioned architectures, whereas microservices is firmly domain partitioned. 虽然 API 层可以用于多种用途,但如果架构师希望忠实于这种架构的基本理念,则不应将其用作中介或编排工具:在这种架构中,所有有趣的逻辑都应发生在一个有界上下文内,而在中介中放置编排或其他逻辑则违反了这一规则。这也说明了架构中技术分区和领域分区之间的区别:架构师通常在技术分区架构中使用中介,而微服务则严格是领域分区的。
Operational Reuse 操作重用
Given that microservices prefers duplication to coupling, how do architects handle the parts of architecture that really do benefit from coupling, such as operational concerns like monitoring, logging, and circuit breakers? One of the philosophies in the traditional service-oriented architecture was to reuse as much functionality as possible, domain and operational alike. In microservices, architects try to split these two concerns. 考虑到微服务更倾向于重复而非耦合,架构师如何处理那些确实受益于耦合的架构部分,例如监控、日志记录和断路器等操作性问题?传统服务导向架构中的一个理念是尽可能重用功能,无论是领域功能还是操作功能。在微服务中,架构师试图将这两个问题分开。
Once a team has built several microservices, they realize that each has common elements that benefit from similarity. For example, if an organization allows each service team to implement monitoring themselves, how can they ensure that each team does so? And how do they handle concerns like upgrades? Does it become the responsibility of each team to handle upgrading to the new version of the monitoring tool, and how long will that take? 一旦一个团队构建了几个微服务,他们意识到每个微服务都有共同的元素,受益于相似性。例如,如果一个组织允许每个服务团队自行实施监控,他们如何确保每个团队都这样做?他们如何处理升级等问题?是否变成了每个团队负责升级到监控工具的新版本,这需要多长时间?
The sidecar pattern offers a solution to this problem, illustrated in Figure 17-2. 边车模式提供了一个解决方案,如图 17-2 所示。
Figure 17-2. The sidecar pattern in microservices 图 17-2. 微服务中的边车模式
In Figure 17-2, the common operational concerns appear within each service as a separate component, which can be owned by either individual teams or a shared infrastructure team. The sidecar component handles all the operational concerns that teams benefit from coupling together. Thus, when it comes time to upgrade the monitoring tool, the shared infrastructure team can update the sidecar, and each microservices receives that new functionality. 在图 17-2 中,常见的操作问题在每个服务中作为一个单独的组件出现,该组件可以由单个团队或共享基础设施团队拥有。边车组件处理团队共同受益的所有操作问题。因此,当需要升级监控工具时,共享基础设施团队可以更新边车,每个微服务都能获得该新功能。
Once teams know that each service includes a common sidecar, they can build a service mesh, allowing unified control across the architecture for concerns like logging and monitoring. The common sidecar components connect to form a consistent operational interface across all microservices, as shown in Figure 17-3. 一旦团队知道每个服务都包含一个公共边车,他们就可以构建一个服务网格,从而在架构中实现对日志记录和监控等问题的统一控制。公共边车组件连接在一起,形成所有微服务之间一致的操作接口,如图 17-3 所示。
Figure 17-3. The service plane connects the sidecars in a service mesh 图 17-3。服务平面连接服务网格中的边车
In Figure 17-3, each sidecar wires into the service plane, which forms the consistent interface to each service. 在图 17-3 中,每个边车都连接到服务层,这形成了与每个服务的一致接口。
The service mesh itself forms a console that allows developers holistic access to services, which is shown in Figure 17-4. 服务网格本身形成了一个控制台,允许开发人员全面访问服务,如图 17-4 所示。
Figure 17-4. The service mesh forms a holistic view of the operational aspect of microservices 图 17-4。服务网格形成了微服务操作方面的整体视图。
Each service forms a node in the overall mesh, as shown in Figure 17-4. The service mesh forms a console that allows teams to globally control operational coupling, such as monitoring levels, logging, and other cross-cutting operational concerns. 每个服务在整体网格中形成一个节点,如图 17-4 所示。服务网格形成一个控制台,允许团队全球控制操作耦合,例如监控级别、日志记录和其他横切的操作问题。
Architects use service discovery as a way to build elasticity into microservices architectures. Rather than invoke a single service, a request goes through a service discovery tool, which can monitor the number and frequency of requests, as well as spin up new instances of services to handle scale or elasticity concerns. Architects often include service discovery in the service mesh, making it part of every microservice. The API layer is often used to host service discovery, allowing a single place for user interfaces or other calling systems to find and create services in an elastic, consistent way. 架构师使用服务发现作为在微服务架构中构建弹性的一种方式。请求不是直接调用单个服务,而是通过服务发现工具,该工具可以监控请求的数量和频率,并根据规模或弹性问题启动新的服务实例。架构师通常将服务发现包含在服务网格中,使其成为每个微服务的一部分。API 层通常用于托管服务发现,允许用户界面或其他调用系统在弹性、一致的方式中找到和创建服务。
Frontends 前端
Microservices favors decoupling, which would ideally encompass the user interfaces as well as backend concerns. In fact, the original vision for microservices included the user interface as part of the bounded context, faithful to the principle in DDD. However, practicalities of the partitioning required by web applications and other external constraints make that goal difficult. Thus, two styles of user interfaces commonly appear for microservices architectures; the first appears in Figure 17-5. 微服务倾向于解耦,这理想情况下应包括用户界面以及后端问题。事实上,微服务的最初愿景将用户界面视为有界上下文的一部分,忠实于 DDD 中的原则。然而,Web 应用程序和其他外部约束所需的分区的实际情况使得这一目标变得困难。因此,微服务架构中常见两种用户界面风格;第一种出现在图 17-5 中。
Figure 17-5. Microservices architecture with a monolithic user interface 图 17-5. 具有单体用户界面的微服务架构
In Figure 17-5, the monolithic frontend features a single user interface that calls through the API layer to satisfy user requests. The frontend could be a rich desktop, mobile, or web application. For example, many web applications now use a JavaScript web framework to build a single user interface. 在图 17-5 中,单体前端具有一个单一的用户界面,通过 API 层调用以满足用户请求。前端可以是一个丰富的桌面、移动或 Web 应用程序。例如,许多 Web 应用程序现在使用 JavaScript Web 框架来构建单一的用户界面。
The second option for user interfaces uses microfrontends, shown in Figure 17-6. 用户界面的第二个选项使用微前端,如图 17-6 所示。
Figure 17-6. Microfrontend pattern in microservices 图 17-6. 微前端模式在微服务中的应用
In Figure 17-6, this approach utilizes components at the user interface level to create a synchronous level of granularity and isolation in the user interface as the backend services. Each service emits the user interface for that service, which the frontend coordinates with the other emitted user interface components. Using this pattern, teams can isolate service boundaries from the user interface to the backend services, unifying the entire domain within a single team. 在图 17-6 中,这种方法在用户界面层利用组件,以在用户界面和后端服务之间创建同步的粒度和隔离级别。每个服务发出该服务的用户界面,前端与其他发出的用户界面组件进行协调。使用这种模式,团队可以将服务边界从用户界面隔离到后端服务,将整个领域统一在一个团队内。
Developers can implement the microfrontend pattern in a variety of ways, either using a component-based web framework such as React or using one of several open source frameworks that support this pattern. 开发人员可以通过多种方式实现微前端模式,既可以使用像 React 这样的基于组件的 Web 框架,也可以使用支持该模式的多个开源框架之一。
Communication 沟通
In microservices, architects and developers struggle with appropriate granularity, which affects both data isolation and communication. Finding the correct communication style helps teams keep services decoupled yet still coordinated in useful ways. 在微服务中,架构师和开发人员在适当的粒度上苦苦挣扎,这影响了数据隔离和通信。找到正确的通信方式有助于团队保持服务的解耦,同时仍以有用的方式进行协调。
Fundamentally, architects must decide on synchronous or asynchronous communication. Synchronous communication requires the caller to wait for a response from the callee. Microservices architectures typically utilize protocol-aware heterogeneous interoperability. We’ll break down that term for you: 从根本上讲,架构师必须决定使用同步或异步通信。同步通信要求调用者等待被调用者的响应。微服务架构通常利用协议感知的异构互操作性。我们将为您详细解释这个术语:
Protocol-aware 协议感知
Because microservices usually don’t include a centralized integration hub to avoid operational coupling, each service should know how to call other services. Thus, architects commonly standardize on how particular services call each other: a certain level of REST, message queues, and so on. That means that services must know (or discover) which protocol to use to call other services. 由于微服务通常不包括集中式集成中心以避免操作耦合,因此每个服务应该知道如何调用其他服务。因此,架构师通常会对特定服务之间的调用方式进行标准化:一定程度的 REST、消息队列等。这意味着服务必须知道(或发现)使用哪种协议来调用其他服务。
Heterogeneous 异构
Because microservices is a distributed architecture, each service may be written in a different technology stack. Heterogeneous suggests that microservices fully supports polyglot environments, where different services use different platforms. 因为微服务是一种分布式架构,每个服务可以使用不同的技术栈。异构意味着微服务完全支持多语言环境,其中不同的服务使用不同的平台。
Interoperability 互操作性
Describes services calling one another. While architects in microservices try to discourage transactional method calls, services commonly call other services via the network to collaborate and send/receive information. 描述服务之间的相互调用。虽然微服务架构师试图不鼓励事务性方法调用,但服务通常通过网络调用其他服务以进行协作和发送/接收信息。
Enforced Heterogeneity 强制异构性
A well-known architect who was a pioneer in the microservices style was the chief architecture at a personal information manager startup for mobile devices. Because they had a fast-moving problem domain, the architect wanted to ensure that none of the development teams accidentally created coupling points between each other, hindering the teams’ ability to move independently. It turned out that this architect had a wide mix of technical skills on the teams, thus mandating that each development team use a different technology stack. If one team was using Java and the other was using .NET, it was impossible to accidentally share classes! 一位著名的架构师是微服务风格的先驱,他曾担任一家移动设备个人信息管理初创公司的首席架构师。由于他们面临快速变化的问题领域,这位架构师希望确保各个开发团队之间不会意外产生耦合点,从而妨碍团队独立发展的能力。事实证明,这位架构师在团队中拥有广泛的技术技能,因此要求每个开发团队使用不同的技术栈。如果一个团队使用 Java 而另一个团队使用.NET,那么意外共享类是不可能的!
This approach is the polar opposite of most enterprise governance policies, which insist on standardizing on a single technology stack. The goal in the microservices world isn’t to create the most complex ecosystem possible, but rather to choose the correct scale technology for the narrow scope of the problem. Not every service needs an industrial-strength relational database, and forcing it on small teams slows them rather than benefitting them. This concept leverages the highly decoupled nature of microservices. 这种方法与大多数企业治理政策截然相反,后者坚持在单一技术栈上进行标准化。在微服务世界中,目标不是创建尽可能复杂的生态系统,而是为特定问题的狭窄范围选择合适规模的技术。并不是每个服务都需要工业级的关系数据库,强迫小团队使用它反而会拖慢他们的速度,而不是带来好处。这个概念利用了微服务高度解耦的特性。
For asynchronous communication, architects often use events and messages, thus internally utilizing an event-driven architecture, covered in Chapter 14; the broker and mediator patterns manifest in microservices as choreography and orchestration. 对于异步通信,架构师通常使用事件和消息,从而在内部利用事件驱动架构,详见第 14 章;代理和中介模式在微服务中表现为编排和协调。
Choreography and Orchestration 编排与协调
Choreography utilizes the same communication style as a broker event-driven architecture. In other words, no central coordinator exists in this architecture, respecting the bounded context philosophy. Thus, architects find it natural to implement decoupled events between services. 编排使用与代理事件驱动架构相同的通信风格。换句话说,在这种架构中不存在中央协调者,尊重边界上下文哲学。因此,架构师发现实现服务之间的解耦事件是自然的。
Domain/architecture isomorphism is one key characteristic that architects should look for when assessing how appropriate an architecture style is for a particular problem. This term describes how the shape of an architecture maps to a particular architecture style. For example, in Figure 8-7, the Silicon Sandwiches’ technically partitioned architecture structurally supports customizability, and the microkernel architecture style offers the same general structure. Therefore, problems that require a high degree of customization become easier to implement in a microkernel. 领域/架构同构是架构师在评估某种架构风格对特定问题的适用性时应关注的一个关键特征。这个术语描述了架构的形状如何映射到特定的架构风格。例如,在图 8-7 中,硅三明治的技术分区架构在结构上支持可定制性,而微内核架构风格提供了相同的一般结构。因此,需要高度定制的问题在微内核中变得更容易实现。
Similarly, because the architect’s goal in a microservices architecture favors decoupling, the shape of microservices resembles the broker EDA, making these two patterns symbiotic. 同样,由于架构师在微服务架构中的目标是促进解耦,微服务的形状类似于代理 EDA,使这两种模式相辅相成。
In choreography, each service calls other services as needed, without a central mediator. For example, consider the scenario shown in Figure 17-7. 在编排中,每个服务根据需要调用其他服务,而没有中央中介。例如,考虑图 17-7 所示的场景。
Figure 17-7. Using choreography in microservices to manage coordination 图 17-7. 在微服务中使用编排来管理协调
In Figure 17-7, the user requests details about a user’s wish list. Because the Customer WishList service doesn’t contain all the necessary information, it makes a call to CustomerDemographics to retrieve the missing information, returning the result to the user. 在图 17-7 中,用户请求有关用户愿望清单的详细信息。由于 Customer WishList 服务不包含所有必要的信息,因此它调用 CustomerDemographics 以检索缺失的信息,并将结果返回给用户。
Because microservices architectures don’t include a global mediator like other service-oriented architectures, if an architect needs to coordinate across several services, they can create their own localized mediator, as shown in Figure 17-8. 因为微服务架构不像其他面向服务的架构那样包含一个全局中介,如果架构师需要在多个服务之间进行协调,他们可以创建自己的本地中介,如图 17-8 所示。
Figure 17-8. Using orchestration in microservices 图 17-8. 在微服务中使用编排
In Figure 17-8, the developers create a service whose sole responsibility is coordinating the call to get all information for a particular customer. The user calls the Report CustomerInformation mediator, which calls the necessary other services. 在图 17-8 中,开发人员创建了一个服务,其唯一职责是协调获取特定客户的所有信息的调用。用户调用报告客户信息的中介,该中介调用必要的其他服务。
The First Law of Software Architecture suggests that neither of these solutions is per-fect-each has trade-offs. In choreography, the architect preserves the highly decoupled philosophy of the architecture style, thus reaping maximum benefits touted by the style. However, common problems like error handling and coordination become more complex in choreographed environments. 软件架构的第一法则表明,这两种解决方案都不是完美的——每种都有权衡。在编排中,架构师保持了架构风格的高度解耦哲学,从而获得了该风格所宣扬的最大利益。然而,像错误处理和协调这样的常见问题在编排环境中变得更加复杂。
Consider an example with a more complex workflow, shown in Figure 17-9. 考虑一个更复杂工作流程的示例,如图 17-9 所示。
Figure 17-9. Using choreography for a complex business process 图 17-9. 使用编排处理复杂业务流程
In Figure 17-9, the first service called must coordinate across a wide variety of other services, basically acting as a mediator in addition to its other domain responsibilities. This pattern is called the front controller pattern, where a nominally choreographed service becomes a more complex mediator for some problem. The downside to this pattern is added complexity in the service. 在图 17-9 中,第一个被调用的服务必须协调各种其他服务,基本上除了其其他领域责任外,还充当中介。这个模式被称为前控制器模式,其中一个名义上被编排的服务成为某个问题的更复杂的中介。这个模式的缺点是服务的复杂性增加。
Alternatively, an architect may choose to use orchestration for complex business processes, illustrated in Figure 17-10. 另外,架构师可以选择使用编排来处理复杂的业务流程,如图 17-10 所示。
Figure 17-10. Using orchestration for a complex business process 图 17-10. 使用编排处理复杂业务流程
In Figure 17-10, the architect builds a mediator to handle the complexity and coordination required for the business workflow. While this creates coupling between these services, it allows the architect to focus coordination into a single service, leaving the others less affected. Often, domain workflows are inherently coupled-the architect’s job entails finding the best way to represent that coupling in ways that support both the domain and architectural goals. 在图 17-10 中,架构师构建了一个中介来处理业务工作流所需的复杂性和协调。虽然这在这些服务之间创建了耦合,但它允许架构师将协调集中到一个服务中,使其他服务受到的影响较小。通常,领域工作流本质上是耦合的——架构师的工作是找到最佳方式来表示这种耦合,以支持领域和架构目标。
Transactions and Sagas 事务和长事务
Architects aspire to extreme decoupling in microservices, but then often encounter the problem of how to do transactional coordination across services. Because the decoupling in the architecture encourages the same level for the databases, atomicity that was trivial in monolithic applications becomes a problem in distributed ones. 架构师渴望在微服务中实现极端解耦,但随后常常遇到如何在服务之间进行事务协调的问题。由于架构中的解耦鼓励数据库保持相同的级别,在单体应用中微不足道的原子性在分布式应用中变成了一个问题。
Building transactions across service boundaries violates the core decoupling principle of the microservices architecture (and also creates the worst kind of dynamic connascence, connascence of value). The best advice for architects who want to do transactions across services is: don’t! Fix the granularity components instead. Often, architects who build microservices architectures who then find a need to wire them together with transactions have gone too granular in their design. Transaction boundaries is one of the common indicators of service granularity. 跨服务边界构建事务违反了微服务架构的核心解耦原则(并且还会产生最糟糕的动态共生,即值的共生)。对于想要在服务之间进行事务的架构师,最好的建议是:不要!而是修复粒度组件。通常,构建微服务架构的架构师在发现需要通过事务将它们连接在一起时,往往在设计上过于细化。事务边界是服务粒度的常见指标之一。
Don’t do transactions in microservices-fix granularity instead! 不要在微服务中进行事务,而是修复粒度!
Exceptions always exist. For example, a situation may arise where two different services need vastly different architecture characteristics, requiring distinct service boundaries, yet still need transactional coordination. In those situations, patterns exist to handle transaction orchestration, with serious trade-offs. 异常总是存在。例如,可能出现一种情况,其中两个不同的服务需要截然不同的架构特性,要求不同的服务边界,但仍然需要事务协调。在这些情况下,存在处理事务编排的模式,但会有严重的权衡。
A popular distributed transactional pattern in microservices is the saga pattern, illustrated in Figure 17-11. 在微服务中,一个流行的分布式事务模式是 saga 模式,如图 17-11 所示。
Figure 17-11. The saga pattern in microservices architecture 图 17-11. 微服务架构中的 Saga 模式
In Figure 17-11, a service acts a mediator across multiple service calls and coordinates the transaction. The mediator calls each part of the transaction, records success or failure, and coordinates results. If everything goes as planned, all the values in the services and their contained databases update synchronously. 在图 17-11 中,一个服务充当多个服务调用之间的中介,并协调事务。中介调用事务的每个部分,记录成功或失败,并协调结果。如果一切按计划进行,服务及其包含的数据库中的所有值将同步更新。
In an error condition, the mediator must ensure that no part of the transaction succeeds if one part fails. Consider the situation shown in Figure 17-12. 在错误条件下,调解者必须确保如果一个部分失败,则交易的任何部分都不会成功。考虑图 17-12 所示的情况。
Figure 17-12. Saga pattern compensating transactions for error conditions 图 17-12. Saga 模式补偿事务以应对错误条件
In Figure 17-12, if the first part of the transaction succeeds, yet the second part fails, the mediator must send a request to all the parts of the transaction that were successful and tell them to undo the previous request. This style of transactional coordination is called a compensating transaction framework. Developers implement this pattern by usually having each request from the mediator enter a pending state until the mediator indicates overall success. However, this design becomes complex if asynchronous requests must be juggled, especially if new requests appear that are contingent on pending transactional state. This also creates a lot of coordination traffic at the network level. 在图 17-12 中,如果事务的第一部分成功,但第二部分失败,协调者必须向所有成功的事务部分发送请求,并告诉它们撤销之前的请求。这种事务协调风格称为补偿事务框架。开发人员通常通过让协调者的每个请求进入待处理状态,直到协调者指示整体成功来实现这种模式。然而,如果必须处理异步请求,尤其是当出现依赖于待处理事务状态的新请求时,这种设计会变得复杂。这也会在网络层面产生大量的协调流量。
Another implementation of a compensating transaction framework has developers build do and undo for each potentially transactional operation. This allows less coordination during transactions, but the undo operations tend to be significantly more complex than the do operations, more than doubling the design, implementation, and debugging work. 另一种补偿事务框架的实现要求开发人员为每个潜在的事务操作构建执行和撤销。这在事务期间减少了协调,但撤销操作往往比执行操作复杂得多,设计、实现和调试的工作量增加了两倍以上。
While it is possible for architects to build transactional behavior across services, it goes against the reason for choosing the microservices pattern. Exceptions always exist, so the best advice for architects is to use the saga pattern sparingly. 虽然架构师可以在服务之间构建事务行为,但这与选择微服务模式的初衷相悖。例外总是存在,因此对架构师的最佳建议是谨慎使用 saga 模式。
A few transactions across services is sometimes necessary; if it’s the dominant feature of the architecture, mistakes were made! 跨服务的一些事务有时是必要的;如果这是架构的主要特征,那就犯了错误!
Architecture Characteristics Ratings 架构特征评级
The microservices architecture style offers several extremes on our standard ratings scale, shown in Figure 17-13. A one-star rating means the specific architecture characteristic isn’t well supported in the architecture, whereas a five-star rating means the architecture characteristic is one of the strongest features in the architecture style. The definition for each characteristic identified in the scorecard can be found in Chapter 4. 微服务架构风格在我们的标准评分尺度上提供了几个极端,如图 17-13 所示。一星评级意味着特定的架构特征在架构中支持不佳,而五星评级则意味着该架构特征是架构风格中最强的特征之一。评分卡中识别的每个特征的定义可以在第 4 章找到。
Notable is the high support for modern engineering practices such as automated deployment, testability, and others not listed. Microservices couldn’t exist without the DevOps revolution and the relentless march toward automating operational concerns. 值得注意的是,对现代工程实践的高度支持,例如自动化部署、可测试性以及其他未列出的实践。微服务无法在没有 DevOps 革命和不断推进自动化运营问题的背景下存在。
As microservices is a distributed architecture, it suffers from many of the deficiencies inherent in architectures made from pieces wired together at runtime. Thus, fault tolerance and reliability are impacted when too much interservice communication is used. However, these ratings only point to tendencies in the architecture; developers fix many of these problems by redundancy and scaling via service discovery. Under normal circumstances, however, independent, single-purpose services generally lead to high fault tolerance, hence the high rating for this characteristic within a microservices architecture. 由于微服务是一种分布式架构,它遭受了许多在运行时将各个部分连接在一起的架构固有的缺陷。因此,当过多的服务间通信被使用时,容错性和可靠性会受到影响。然而,这些评级仅指向架构中的趋势;开发人员通过冗余和通过服务发现进行扩展来修复许多这些问题。然而,在正常情况下,独立的单一目的服务通常会导致高容错性,因此在微服务架构中这一特性的高评级。
Architecture characteristic 架构特征
Star rating 星级评分
Partitioning type 分区类型
Domain 领域
Number of quanta 量子数
1 to many 一对多
Deployability 可部署性
気気 气气
Elasticity 弹性
Evolutionary 演化的
成気 成气
Fault tolerance 容错
雄気 雄气
Modularity 模块化
ぶった
Overall cost 总体成本
訋
Performance 性能
瓦
Reliability 可靠性
そう気 这样感觉
Scalability 可扩展性
NEN
Simplicity 简单性
号
Testability 可测试性
隹边
Architecture characteristic Star rating
Partitioning type Domain
Number of quanta 1 to many
Deployability 気気
Elasticity https://cdn.mathpix.com/cropped/2025_02_11_560afc03665d2bf0aa03g-284.jpg?height=67&width=187&top_left_y=512&top_left_x=927
Evolutionary 成気
Fault tolerance 雄気
Modularity ぶった
Overall cost 訋
Performance 瓦
Reliability そう気
Scalability NEN
Simplicity 号
Testability 隹边| Architecture characteristic | Star rating |
| :---: | :---: |
| Partitioning type | Domain |
| Number of quanta | 1 to many |
| Deployability | 気気 |
| Elasticity |  |
| Evolutionary | 成気 |
| Fault tolerance | 雄気 |
| Modularity | ぶった |
| Overall cost | 訋 |
| Performance | 瓦 |
| Reliability | そう気 |
| Scalability | NEN |
| Simplicity | 号 |
| Testability | 隹边 |
Figure 17-13.Ratings for microservices 图 17-13.微服务的评分
The high points of this architecture are scalability,elasticity,and evolutionary.Some of the most scalable systems yet written have utilized this style to great success.Simi- larly,because the architecture relies heavily on automation and intelligent integration with operations,developers can also build elasticity support into the architecture. Because the architecture favors high decoupling at an incremental level,it supports the modern business practice of evolutionary change,even at the architecture level. Modern business move fast,and software development has struggled to keep apace. By building an architecture that has extremely small deployment units that are highly decoupled,architects have a structure that can support a faster rate of change. 该架构的高点是可扩展性、弹性和演进性。一些迄今为止编写的最具可扩展性的系统成功地利用了这种风格。同样,由于该架构在很大程度上依赖于自动化和与操作的智能集成,开发人员还可以在架构中构建弹性支持。由于该架构在增量级别上偏向于高解耦,它支持现代商业实践中的演进变化,甚至在架构层面上也是如此。现代商业发展迅速,而软件开发一直在努力跟上。通过构建具有极小部署单元且高度解耦的架构,架构师拥有一个可以支持更快变化速度的结构
Performance is often an issue in microservices-distributed architectures must make many network calls to complete work,which has high performance overhead,and they must invoke security checks to verify identity and access for each endpoint. Many patterns exist in the microservices world to increase performance,including intelligent data caching and replication to prevent an excess of network calls. 性能在微服务中常常是一个问题——分布式架构必须进行许多网络调用才能完成工作,这带来了高性能开销,并且它们必须调用安全检查以验证每个端点的身份和访问权限。在微服务领域中存在许多模式来提高性能,包括智能数据缓存和复制,以防止过多的网络调用
Performance is another reason that microservices often use choreography rather than orchestration, as less coupling allows for faster communication and fewer bottlenecks. 性能是微服务通常使用编排而不是协调的另一个原因,因为较少的耦合允许更快的通信和更少的瓶颈。
Microservices is decidedly a domain-centered architecture, where each service boundary should correspond to domains. It also has the most distinct quanta of any modern architecture-in many ways, it exemplifies what the quantum measure evaluates. The driving philosophy of extreme decoupling creates many headaches in this architecture but yields tremendous benefits when done well. As in any architecture, architects must understand the rules to break them intelligently. 微服务无疑是一种以领域为中心的架构,其中每个服务边界应对应于领域。它也是现代架构中最明显的量子——在许多方面,它体现了量子度量所评估的内容。极端解耦的驱动哲学在这种架构中带来了许多麻烦,但当做得好时却能带来巨大的好处。与任何架构一样,架构师必须理解规则,以便智能地打破它们。
Additional References 附加参考文献
While our goal in this chapter was to touch on some of the significant aspects of this architecture style, many excellent resources exist to get further and more detailed about this architecture style. Additional and more detailed information can be found about microservices in the following references: 虽然我们在本章的目标是触及这种架构风格的一些重要方面,但仍然有许多优秀的资源可以更深入和详细地了解这种架构风格。关于微服务的更多详细信息可以在以下参考资料中找到:
Building Microservices by Sam Newman (O’Reilly) 《构建微服务》 by Sam Newman (O’Reilly)
Microservices vs. Service-Oriented Architecture by Mark Richards (O’Reilly) 微服务与面向服务架构 由 Mark Richards (O’Reilly)
Microservices AntiPatterns and Pitfalls by Mark Richards (O’Reilly) 微服务反模式和陷阱 由 Mark Richards (O’Reilly)
CHAPTER 18 第 18 章
Choosing the Appropriate Architecture Style 选择合适的架构风格
Abstract 摘要
It depends! With all the choices available (and new ones arriving almost daily), we would like to tell you which one to use-but we cannot. Nothing is more contextual to a number of factors within an organization and what software it builds. Choosing an architecture style represents the culmination of analysis and thought about tradeoffs for architecture characteristics, domain considerations, strategic goals, and a host of other things. However contextual the decision is, some general advice exists around choosing an appropriate architecture style. 这要看情况!随着可用选择的增多(几乎每天都有新的选择出现),我们想告诉你使用哪一个,但我们无法做到。没有什么比组织内部的多个因素及其构建的软件更具上下文性。选择一种架构风格代表了对架构特性、领域考虑、战略目标以及其他许多因素的权衡分析和思考的结果。尽管决策具有上下文性,但在选择合适的架构风格时,仍然存在一些一般性的建议。
Shifting "Fashion" in Architecture 在建筑中转变“时尚”
Preferred architecture styles shift over time, driven by a number of factors: 首选的架构风格随着时间的推移而变化,受到多种因素的驱动:
Observations from the past 来自过去的观察
New architecture styles generally arise from observations and pain points from past experiences. Architects have experience with systems in the past that influence their thoughts about future systems. Architects must rely on their past expe-rience-it is that experience that allowed that person to become an architect in the first place. Often, new architecture designs reflect specific deficiencies from past architecture styles. For example, architects seriously rethought the implications of code reuse after building architectures that featured it and then realizing the negative trade-offs. 新的架构风格通常源于对过去经验的观察和痛点。架构师在过去的系统中积累的经验影响了他们对未来系统的思考。架构师必须依赖他们的过去经验——正是这些经验使得他们能够成为架构师。通常,新的架构设计反映了过去架构风格的特定缺陷。例如,架构师在构建了以代码重用为特征的架构后,认真重新思考了代码重用的影响,并意识到其负面权衡。
Changes in the ecosystem 生态系统的变化
Constant change is a reliable feature of the software development ecosystemeverything changes all the time. The change in our ecosystem is particularly 持续变化是软件开发生态系统的一个可靠特征——一切都在不断变化。我们生态系统中的变化尤其
chaotic, making even the type of change impossible to predict. For example, a few years ago, no one knew what Kubernetes was, and now there are multiple conferences around the world with thousands of developers. In a few more years, Kubernetes may be replaced with some other tool that hasn’t been written yet. 混乱,甚至使得变化的类型变得无法预测。例如,几年前,没有人知道 Kubernetes 是什么,而现在全球有多个会议,吸引了数千名开发者。再过几年,Kubernetes 可能会被一些尚未开发的其他工具所取代。
New capabilities 新能力
When new capabilities arise, architecture may not merely replace one tool with another but rather shift to an entirely new paradigm. For example, few architects or developers anticipated the tectonic shift caused in the software development world by the advent of containers such as Docker. While it was an evolutionary step, the impact it had on architects, tools, engineering practices, and a host of other factors astounded most in the industry. The constant change in the ecosystem also delivers a new collection of tools and capabilities on a regular basis. Architects must keep a keen eye open to not only new tools but new paradigms. Something may just look like a new one-of-something-we-already-have, but it may include nuances or other changes that make it a game changer. New capabilities don’t even have to rock the entire development world-the new features may be a minor change that aligns exactly with an architect’s goals. 当新的能力出现时,架构可能不仅仅是用另一种工具替换一个工具,而是转向一个全新的范式。例如,很少有架构师或开发者预见到容器(如 Docker)的出现对软件开发世界造成的重大变革。虽然这是一种渐进的步骤,但它对架构师、工具、工程实践以及其他许多因素的影响让行业内的大多数人感到震惊。生态系统的不断变化也定期提供了一系列新的工具和能力。架构师必须密切关注不仅是新工具,还有新的范式。有些东西可能看起来只是我们已经拥有的某种新工具,但它可能包含细微差别或其他变化,使其成为游戏规则的改变者。新的能力甚至不必彻底改变整个开发世界——新特性可能是一个小的变化,恰好与架构师的目标一致。
Acceleration 加速
Not only does the ecosystem constantly change, but the rate of change also continues to rise. New tools create new engineering practices, which lead to new design and capabilities. Architects live in a constant state of flux because change is both pervasive and constant. 生态系统不仅不断变化,而且变化的速度也在持续上升。新工具创造了新的工程实践,这导致了新的设计和能力。架构师处于一种不断变化的状态,因为变化是普遍且持续的。
Domain changes 领域变化
The domain that developers write software for constantly shifts and changes, either because the business continues to evolve or because of factors like mergers with other companies. 开发人员编写软件的领域不断变化,既因为业务持续发展,也因为与其他公司的合并等因素。
Technology changes 技术变化
As technology continues to evolve, organizations try to keep up with at least some of these changes, especially those with obvious bottom-line benefits. 随着技术的不断发展,组织努力跟上至少一些这些变化,特别是那些显而易见的底线收益。
External factors 外部因素
Many external factors only peripherally associated with software development may drive change within an organizations. For example, architects and developers might be perfectly happy with a particular tool, but the licensing cost has become prohibitive, forcing a migration to another option. 许多与软件开发仅有间接关联的外部因素可能会推动组织内部的变革。例如,架构师和开发人员可能对某个特定工具非常满意,但许可成本变得过高,迫使他们迁移到另一个选项。
Regardless of where an organization stands in terms of current architecture fashion, an architect should understand current industry trends to make intelligent decisions about when to follow and when to make exceptions. 无论一个组织在当前架构潮流中处于何种地位,架构师都应该了解当前行业趋势,以便在何时跟随和何时做出例外时做出明智的决策。
Decision Criteria 决策标准
When choosing an architectural style, an architect must take into account all the various factors that contribute to the structure for the domain design. Fundamentally, an architect designs two things: whatever domain has been specified, and all the other structural elements required to make the system a success. 在选择架构风格时,架构师必须考虑所有影响领域设计结构的各种因素。从根本上讲,架构师设计两件事:指定的领域以及使系统成功所需的所有其他结构元素。
Architects should go into the design decision comfortable with the following things: 架构师在进行设计决策时应对以下事项感到舒适:
The domain 领域
Architects should understand many important aspects of the domain, especially those that affect operational architecture characteristics. Architects don’t have to be subject matter experts, but they must have at least a good general understanding of the major aspects of the domain under design. 架构师应该理解领域的许多重要方面,特别是那些影响操作架构特征的方面。架构师不必成为主题专家,但他们必须至少对设计中的主要领域方面有良好的总体理解。
Architecture characteristics that impact structure 影响结构的架构特性
Architects must discover and elucidate the architecture characteristics needed to support the domain and other external factors. 架构师必须发现并阐明支持领域和其他外部因素所需的架构特征。
Data architecture 数据架构
Architects and DBAs must collaborate on database, schema, and other datarelated concerns. We don’t cover much about data architecture in this book; it is its own specialization. However, architects must understand the impact that data design might have on their design, particularly if the new system must interact with an older and/or in-use data architecture. 架构师和数据库管理员必须在数据库、架构和其他与数据相关的问题上进行协作。我们在本书中并没有详细讨论数据架构;它是一个独立的专业领域。然而,架构师必须理解数据设计可能对他们的设计产生的影响,特别是当新系统必须与旧的和/或正在使用的数据架构进行交互时。
Organizational factors 组织因素
Many external factors may influence design. For example, the cost of a particular cloud vendor may prevent the ideal design. Or perhaps the company plans to engage in mergers and acquisitions, which encourages an architect to gravitate toward open solutions and integration architectures. 许多外部因素可能会影响设计。例如,某个云供应商的成本可能会阻止理想设计的实现。或者公司计划进行并购,这促使架构师倾向于开放解决方案和集成架构。
Knowledge of process, teams, and operational concerns 对流程、团队和运营问题的了解
Many specific project factors influence an architect’s design, such as the software development process, interaction (or lack of) with operations, and the QA process. For example, if an organization lacks maturity in Agile engineering practices, architecture styles that rely on those practices for success will present difficulties. 许多具体的项目因素会影响架构师的设计,例如软件开发过程、与运营的互动(或缺乏互动)以及质量保证过程。例如,如果一个组织在敏捷工程实践方面缺乏成熟度,那么依赖这些实践取得成功的架构风格将会面临困难。
Domain/architecture isomorphism 领域/架构同构
Some problem domains match the topology of the architecture. For example, the microkernel architecture style is perfectly suited to a system that requires cus-tomizability-the architect can design customizations as plug-ins. Another example might be genome analysis, which requires a large number of discrete 某些问题领域与架构的拓扑相匹配。例如,微内核架构风格非常适合需要可定制性的系统——架构师可以将定制设计为插件。另一个例子可能是基因组分析,它需要大量离散的
operations, and space-based architecture, which offers a large number of discrete processors. 操作和基于空间的架构,提供大量离散处理器。
Similarly, some problem domains may be particularly ill-suited for some architecture styles. For example, highly scalable systems struggle with large monolithic designs because architects find it difficult to support a large number of concurrent users in a highly coupled code base. A problem domain that includes a huge amount of semantic coupling matches poorly with a highly decoupled, distributed architecture. For instance, an insurance company application consisting of multipage forms, each of which is based on the context of previous pages, would be difficult to model in microservices. This is a highly coupled problem that will present architects with design challenges in a decoupled architecture; a less coupled architecture like service-based architecture would suit this problem better. 同样,一些问题领域可能特别不适合某些架构风格。例如,高度可扩展的系统在大型单体设计中会遇到困难,因为架构师发现很难在高度耦合的代码库中支持大量并发用户。一个包含大量语义耦合的问题领域与高度解耦的分布式架构不匹配。例如,一个保险公司应用程序由多个页面表单组成,每个表单都基于前面页面的上下文,这在微服务中建模会很困难。这是一个高度耦合的问题,在解耦架构中会给架构师带来设计挑战;像基于服务的架构这样的低耦合架构更适合这个问题。
Taking all these things into account, the architect must make several determinations: 考虑到所有这些因素,架构师必须做出几个决定:
Monolith versus distributed 单体与分布式
Using the quantum concepts discussed earlier, the architect must determine if a single set of architecture characteristics will suffice for the design, or do different parts of the system need differing architecture characteristics? A single set implies that a monolith is suitable (although other factors may drive an architect toward a distributed architecture), whereas different architecture characteristics imply a distributed architecture. 利用之前讨论的量子概念,架构师必须确定单一的架构特征集是否足以满足设计要求,还是系统的不同部分需要不同的架构特征?单一特征集意味着适合使用单体架构(尽管其他因素可能会促使架构师选择分布式架构),而不同的架构特征则意味着需要分布式架构。
Where should data live? 数据应该存放在哪里?
If the architecture is monolithic, architects commonly assume a single relational databases or a few of them. In a distributed architecture, the architect must decide which services should persist data, which also implies thinking about how data must flow throughout the architecture to build workflows. Architects must consider both structure and behavior when designing architecture and not be fearful of iterating on the design to find better combinations. 如果架构是单体的,架构师通常假设使用一个或几个关系数据库。在分布式架构中,架构师必须决定哪些服务应该持久化数据,这也意味着要考虑数据在架构中的流动方式以构建工作流。架构师在设计架构时必须同时考虑结构和行为,并且不应害怕对设计进行迭代以找到更好的组合。
What communication styles between services-synchronous or asynchronous? 服务之间的通信风格是同步的还是异步的?
Once the architect has determined data partitioning, their next design consideration is the communication between services-synchronous or asynchronous? Synchronous communication is more convenient in most cases, but it can lead to scalability, reliability, and other undesirable characteristics. Asynchronous communication can provide unique benefits in terms of performance and scale but can present a host of headaches: data synchronization, deadlocks, race conditions, debugging, and so on. 一旦架构师确定了数据分区,他们下一个设计考虑是服务之间的通信——同步还是异步?在大多数情况下,同步通信更为方便,但它可能导致可扩展性、可靠性和其他不良特性。异步通信在性能和规模方面可以提供独特的好处,但也可能带来一系列麻烦:数据同步、死锁、竞争条件、调试等等。
Because synchronous communication presents fewer design, implementation, and debugging challenges, architects should default to synchronous when possible and use asynchronous when necessary. 由于同步通信在设计、实现和调试方面面临的挑战较少,架构师应在可能的情况下默认使用同步通信,而在必要时使用异步通信。
Use synchronous by default, asynchronous when necessary. 默认使用同步,必要时使用异步。
The output of this design process is architecture topology, taking into account what architecture style (and hybridizations) the architect chose, architecture decision records about the parts of the design which required the most effort by the architect, and architecture fitness functions to protect important principles and operational architecture characteristics. 该设计过程的输出是架构拓扑,考虑到架构师选择的架构风格(及其混合形式)、关于设计中需要架构师付出最大努力的部分的架构决策记录,以及保护重要原则和操作架构特征的架构适应性函数。
Monolith Case Study: Silicon Sandwiches 单体案例研究:硅三明治
In the Silicon Sandwiches architecture kata, after investigating the architecture characteristics, we determined that a single quantum was sufficient to implement this system. Plus, this is a simple application without a huge budget, so the simplicity of a monolith appeals. 在硅三明治架构练习中,在调查架构特性后,我们确定一个单一的量子足以实现该系统。此外,这是一款没有巨大预算的简单应用,因此单体的简单性很有吸引力。
However, we created two different component designs for Silicon Sandwiches: one domain partitioned and another technically partitioned. Given the simplicity of the solution, we’ll create designs for each and cover trade-offs. 然而,我们为硅三明治创建了两种不同的组件设计:一种是领域分区,另一种是技术分区。考虑到解决方案的简单性,我们将为每种设计创建方案并讨论权衡。
Modular Monolith 模块化大单体
A modular monolith builds domain-centric components with a single database, deployed as a single quantum; the modular monolith design for Silicon Sandwiches appears in Figure 18-1. 模块化单体构建以领域为中心的组件,使用单一数据库,作为一个整体进行部署;硅三明治的模块化单体设计如图 18-1 所示。
This is a monolith with a single relational database, implemented with a single webbased user interface (with careful design considerations for mobile devices) to keep overall cost down. Each of the domains the architect identified earlier appear as components. If time and resources are sufficient, the architect should consider creating the same separation of tables and other database assets as the domain components, allowing for this architecture to migrate to a distributed architecture more easily if future requirements warrant it. 这是一个单体应用,使用单一的关系数据库,通过一个基于网页的用户界面实现(在设计时仔细考虑了移动设备),以降低整体成本。架构师之前识别的每个领域都作为组件出现。如果时间和资源充足,架构师应考虑创建与领域组件相同的表和其他数据库资产的分离,这样如果未来的需求需要,允许该架构更容易地迁移到分布式架构。
Figure 18-1. A modular monolith implementation of Silicon Sandwiches 图 18-1. 硅三明治的模块化单体实现
Because the architecture style itself doesn’t inherently handle customization, the architect must make sure that that feature becomes part of domain design. In this case, the architect designs an Override endpoint where developers can upload individual customizations. Correspondingly, the architect must ensure that each of the domain components references the Override component for each customizable char-acteristic-this would make a perfect fitness function. 因为架构风格本身并不固有地处理定制,架构师必须确保该功能成为领域设计的一部分。在这种情况下,架构师设计了一个 Override 端点,开发人员可以在此上传个别定制。相应地,架构师必须确保每个领域组件都引用 Override 组件,以便针对每个可定制特性——这将成为一个完美的适应度函数。
Microkernel 微内核
One of the architecture characteristics the architect identified in Silicon Sandwiches was customizability. Looking at domain/architecture isomorphism, an architect may choose to implement it using a microkernel, as illustrated in Figure 18-2. 架构师在硅三明治中识别出的架构特性之一是可定制性。考虑到领域/架构同构,架构师可以选择使用微内核来实现,如图 18-2 所示。
Figure 18-2. A microkernel implementation of Silicon Sandwiches 图 18-2. 硅三明治的微内核实现
In Figure 18-2, the core system consists of the domain components and a single relational database. As in the previous design, careful synchronization between domains and data design will allow future migration of the core to a distributed architecture. Each customization appears in a plug-in, the common ones in a single set of plug-ins (with a corresponding database), and a series of local ones, each with their own data. Because none of the plug-ins need to be coupled to the other plug-ins, they can each maintain their data, leaving the plug-ins decoupled. 在图 18-2 中,核心系统由领域组件和一个单一的关系数据库组成。与之前的设计一样,领域与数据设计之间的仔细同步将允许未来将核心迁移到分布式架构中。每个定制都出现在一个插件中,常见的插件在一组插件中(带有相应的数据库),而一系列本地插件则各自拥有自己的数据。由于没有任何插件需要与其他插件耦合,它们可以各自维护自己的数据,从而使插件解耦。
The other unique design element here utilizes the Backends for Frontends (BFF) pattern, making the API layer a thin microkernel adaptor. It supplies general information from the backend, and the BFF adaptors translate the generic information into the suitable format for the frontend device. For example, the BFF for iOS will take the generic output from the backend and customize it for what the iOS native application expects: the data format, pagination, latency, and other factors. Building each BFF 这里的另一个独特设计元素利用了后端为前端(BFF)模式,使 API 层成为一个薄的微内核适配器。它从后端提供一般信息,而 BFF 适配器将通用信息转换为前端设备所需的适当格式。例如,iOS 的 BFF 将从后端获取通用输出,并根据 iOS 原生应用程序的期望进行定制:数据格式、分页、延迟和其他因素。构建每个 BFF
adaptor allows for the richest user interfaces and the ability to expand to support other devices in the future-one of the benefits of the microkernel style. 适配器允许实现最丰富的用户界面,并能够扩展以支持未来的其他设备——这是微内核风格的一个好处。
Communication within either Silicon Sandwich architecture can be synchronousthe architecture doesn’t require extreme performance or elasticity requirements-and none of the operations will be lengthy. 在硅三明治架构中,通信可以是同步的——该架构不需要极端的性能或弹性要求——并且没有任何操作会很长。
Distributed Case Study: Going, Going, Gone 分布式案例研究:去,去,去
The Going, Going, Gone (GGG) kata presents more interesting architecture challenges. Based on the component analysis in “Case Study: Going, Going, Gone: Discovering Components” on page 112, this architecture needs differing architecture characteristics for different parts of the architecture. For example, architecture characteristics like availability and scalability will differ between roles like auctioneer and bidder. 《Going, Going, Gone (GGG)》练习提出了更有趣的架构挑战。基于第 112 页的“案例研究:Going, Going, Gone:发现组件”中的组件分析,这种架构需要不同部分具有不同的架构特性。例如,像可用性和可扩展性这样的架构特性在拍卖师和竞标者等角色之间会有所不同。
The requirements for GGG also explicitly state certain ambitious levels of scale, elasticity, performance, and a host of other tricky operational architecture characteristics. The architect needs to choose a pattern that allows for a high degree of customization at a fine-grained level within the architecture. Of the candidate distributed architectures, either low-level event-driven or microservices match most of the architecture characteristics. Of the two, microservices better supports differing operational architecture characteristics-purely event-driven architectures typically don’t separate pieces because of these operational architecture characteristics but are rather based on communication style, orchestrated versus choreographed. GGG 的要求还明确指出了某些雄心勃勃的规模、弹性、性能水平,以及一系列其他棘手的操作架构特征。架构师需要选择一种模式,以便在架构内实现高度的细粒度定制。在候选的分布式架构中,低级事件驱动或微服务最符合大多数架构特征。在这两者中,微服务更好地支持不同的操作架构特征——纯事件驱动架构通常不会因为这些操作架构特征而分离组件,而是基于通信风格,协调式与编排式。
Achieving the stated performance will provide a challenge in microservices, but architects can often address any weak point of an architecture by designing to accommodate it. For example, while microservices offers a high degrees of scalability naturally, architects commonly have to address specific performance issues caused by too much orchestration, too aggressive data separation, and so on. 实现所述性能在微服务中将带来挑战,但架构师通常可以通过设计来解决架构的任何弱点。例如,虽然微服务自然提供了高度的可扩展性,但架构师通常必须解决由于过多的编排、过于激进的数据分离等引起的特定性能问题。
An implementation of GGG using microservices is shown in Figure 18-3. 图 18-3 显示了使用微服务的 GGG 实现。
Figure 18-3. A microservices implementation of Going, Going, Gone 图 18-3。Going, Going, Gone 的微服务实现
In Figure 18-3, each identified component became services in the architecture, matching component and service granularity. GGG has three distinct user interfaces: 在图 18-3 中,每个识别的组件都成为架构中的服务,匹配组件和服务的粒度。GGG 有三个不同的用户界面:
Bidder 投标人
The numerous bidders for the online auction. 在线拍卖的众多竞标者。
Auctioneer 拍卖师
One per auction. 每个拍卖一个。
Streamer 主播
Service responsible for streaming video and bid stream to the bidders. Note that this is a read-only stream, allowing optimizations not available if updates were necessary. 负责将视频和竞标流传输给投标者的服务。请注意,这是一个只读流,允许进行在需要更新时无法实现的优化。
The following services appear in this design of the GGG architecture: 在 GGG 架构的设计中出现了以下服务:
BidCapture
Captures online bidder entries and asynchronously sends them to Bid Tracker. This service needs no persistence because it acts as a conduit for the online bids. 捕获在线竞标者的输入并异步将其发送到 Bid Tracker。该服务不需要持久性,因为它充当在线竞标的通道。
BidStreamer
Streams the bids back to online participants in a high performance, read-only stream. 以高性能、只读流的方式将出价流回在线参与者。
BidTracker
Tracks bids from both Auctioneer Capture and Bid Capture. This is the component that unifies the two different information streams, ordering the bids as close to real time as possible. Note that both inbound connections to this service are asynchronous, allowing the developers to use message queues as buffers to handle very different rates of message flow. 跟踪来自 Auctioneer Capture 和 Bid Capture 的出价。这是统一两个不同信息流的组件,尽可能接近实时地排序出价。请注意,连接到此服务的两个入站连接都是异步的,允许开发人员使用消息队列作为缓冲区来处理非常不同的消息流速率。
Auctioneer Capture 拍卖师捕获
Captures bids for the auctioneer. The result of quanta analysis in “Case Study: Going, Going, Gone: Discovering Components” on page 112 led the architect to separate Bid Capture and Auctioneer Capture because they have quite different architecture characteristics. 捕获拍卖师的出价。在第 112 页的“案例研究:即将结束:发现组件”中的量子分析结果使架构师决定将出价捕获和拍卖师捕获分开,因为它们具有截然不同的架构特征。
Auction Session 拍卖会话
This manages the workflow of individual auctions. 这管理着各个拍卖的工作流程。
Payment 付款
Third-party payment provider that handles payment information after the Auction Session has completed the auction. 第三方支付提供商在拍卖会结束后处理支付信息。
Video Capture 视频捕捉
Captures the video stream of the live auction. 捕获实时拍卖的视频流。
Video Streamer 视频流媒体播放器
Streams the auction video to online bidders. 将拍卖视频流式传输给在线竞标者。
The architect was careful to identify both synchronous and asynchronous communication styles in this architecture. Their choice for asynchronous communication is primarily driven by accommodating differing operational architecture characteristics between services. For example, if the Payment service can only process a new payment every 500 ms and a large number of auctions end at the same time, synchronous communication between the services would cause time outs and other reliability headaches. By using message queues, the architect can add reliability to a critical part of the architecture that exhibits fragility. 架构师仔细识别了该架构中的同步和异步通信风格。他们选择异步通信主要是为了适应服务之间不同的操作架构特征。例如,如果支付服务每 500 毫秒只能处理一个新支付,而大量拍卖同时结束,服务之间的同步通信将导致超时和其他可靠性问题。通过使用消息队列,架构师可以为表现出脆弱性的架构关键部分增加可靠性。
In the final analysis, this design resolved to five quanta, identified in Figure 18-4. 最终分析中,该设计归结为五个量子,如图 18-4 所示。
Figure 18-4. The quanta boundaries for GGGG G G 图 18-4. GGGG G G 的量子边界
In Figure 18-4, the design includes quanta for Payment, Auctioneer, Bidder, Bidder Streams, and Bid Tracker, roughly corresponding to the services. Multiple instances are indicated by stacks of containers in the diagram. Using quantum analysis at the component design stage allowed the architect to more easily identify service, data, and communication boundaries. 在图 18-4 中,设计包括支付、拍卖师、竞标者、竞标者流和竞标跟踪器的量子,粗略对应于服务。多个实例在图中通过容器堆栈表示。在组件设计阶段使用量子分析使架构师更容易识别服务、数据和通信边界。
Note that this isn’t the “correct” design for GGG, and it’s certainly not the only one. We don’t even suggest that it’s the best possible design, but it seems to have the least worst set of trade-offs. Choosing microservices, then intelligently using events and messages, allows the architecture to leverage the most out of a generic architecture pattern while still building a foundation for future development and expansion. 请注意,这并不是 GGG 的“正确”设计,当然也不是唯一的设计。我们甚至不建议这是最佳设计,但它似乎具有最少的坏权衡。选择微服务,然后智能地使用事件和消息,可以让架构充分利用通用架构模式,同时为未来的发展和扩展奠定基础。
Techniques and Soft Skills 技术和软技能
An effective software architect must not only understand the technical aspects of software architecture, but also the primary techniques and soft skills necessary to think like an architect, guide development teams, and effectively communicate the architecture to various stakeholders. This section of the book addresses the key techniques and soft skills necessary to become an effective software architect. 一个有效的软件架构师不仅必须理解软件架构的技术方面,还必须掌握思考像架构师所需的主要技术和软技能,以指导开发团队并有效地与各个利益相关者沟通架构。本书的这一部分讨论了成为有效软件架构师所需的关键技术和软技能。
CHAPTER 19 第 19 章
Architecture Decisions 架构决策
One of the core expectations of an architect is to make architecture decisions. Architecture decisions usually involve the structure of the application or system, but they may involve technology decisions as well, particularly when those technology decisions impact architecture characteristics. Whatever the context, a good architecture decision is one that helps guide development teams in making the right technical choices. Making architecture decisions involves gathering enough relevant information, justifying the decision, documenting the decision, and effectively communicating that decision to the right stakeholders. 架构师的核心期望之一是做出架构决策。架构决策通常涉及应用程序或系统的结构,但它们也可能涉及技术决策,特别是当这些技术决策影响架构特性时。无论在什么背景下,一个好的架构决策是能够帮助开发团队做出正确技术选择的决策。做出架构决策涉及收集足够的相关信息、为决策提供依据、记录决策,并有效地将该决策传达给相关利益相关者。
Architecture Decision Anti-Patterns 架构决策反模式
There is an art to making architecture decisions. Not surprisingly, several architecture anti-patterns emerge when making decisions as an architect. The programmer Andrew Koenig defines an anti-pattern as something that seems like a good idea when you begin, but leads you into trouble. Another definition of an anti-pattern is a repeatable process that produces negative results. The three major architecture antipatterns that can (and usually do) emerge when making architecture decisions are the Covering Your Assets anti-pattern, the Groundhog Day anti-pattern, and the EmailDriven Architecture anti-pattern. These three anti-patterns usually follow a progressive flow: overcoming the Covering Your Assets anti-pattern leads to the Groundhog Day anti-pattern, and overcoming this anti-pattern leads to the Email-Driven Architecture anti-pattern. Making effective and accurate architecture decisions requires an architect to overcome all three of these anti-patterns. 做出架构决策是一门艺术。毫不奇怪,在作为架构师做出决策时,会出现几种架构反模式。程序员安德鲁·科宁(Andrew Koenig)将反模式定义为在开始时看起来是个好主意,但最终会导致麻烦的东西。反模式的另一个定义是一个可重复的过程,产生负面结果。在做出架构决策时,可能(并且通常会)出现的三种主要架构反模式是“保护资产”反模式、“土拨鼠日”反模式和“电子邮件驱动架构”反模式。这三种反模式通常遵循一个渐进的流程:克服“保护资产”反模式会导致“土拨鼠日”反模式,而克服这个反模式又会导致“电子邮件驱动架构”反模式。做出有效和准确的架构决策需要架构师克服这三种反模式。
Covering Your Assets Anti-Pattern 覆盖你的资产反模式
The first anti-pattern to emerge when trying to make architecture decisions is the Covering Your Assets anti-pattern. This anti-pattern occurs when an architect avoids or defers making an architecture decision out of fear of making the wrong choice. 在尝试做出架构决策时,第一个出现的反模式是“保护你的资产”反模式。当架构师出于对错误选择的恐惧而避免或推迟做出架构决策时,就会出现这种反模式。
There are two ways to overcome this anti-pattern. The first is to wait until the last responsible moment to make an important architecture decision. The last responsible moment means waiting until you have enough information to justify and validate your decision, but not waiting so long that you hold up development teams or fall into the Analysis Paralysis anti-pattern. The second way to avoid this anti-pattern is to continually collaborate with development teams to ensure that the decision you made can be implemented as expected. This is vitally important because it is not feasible as an architect to possibly know every single detail about a particular technology and all the associated issues. By closely collaborating with development teams, the architect can respond quickly to a change in the architecture decision if issues occur. 克服这种反模式有两种方法。第一种是在最后一个负责的时刻做出重要的架构决策。最后一个负责的时刻意味着等待直到你拥有足够的信息来证明和验证你的决策,但不要等到拖延开发团队或陷入分析瘫痪反模式。避免这种反模式的第二种方法是与开发团队持续合作,以确保你所做的决策可以按预期实施。这一点至关重要,因为作为架构师,可能无法了解某一特定技术及其所有相关问题的每一个细节。通过与开发团队密切合作,架构师可以在出现问题时迅速响应架构决策的变化。
To illustrate this point, suppose an architect makes the decision that all productrelated reference data (product description, weight, and dimensions) be cached in all service instances needing that information using a read-only replicated cache, with the primary replica owned by the catalog service. A replicated cache means that if there are any changes to product information (or a new product is added), the catalog service would update its cache, which would then be replicated to all other services requiring that data through a replicated (in-memory) cache product. A good justification for this decision is to reduce coupling between the services and to effectively share data without having to make an interservice call. However, the development teams implementing this architecture decision find that due to certain scalability requirements of some of the services, this decision would require more in-process memory than is available. By closely collaborating with the development teams, the architect can quickly become aware of the issue and adjust the architecture decision to accommodate these situations. 为了说明这一点,假设架构师决定将所有与产品相关的参考数据(产品描述、重量和尺寸)缓存到所有需要该信息的服务实例中,使用只读的复制缓存,主副本由目录服务拥有。复制缓存意味着如果产品信息有任何更改(或添加新产品),目录服务将更新其缓存,然后通过复制(内存中的)缓存产品将其复制到所有其他需要该数据的服务。这个决定的一个很好的理由是减少服务之间的耦合,并有效地共享数据,而不必进行服务间调用。然而,实施这一架构决策的开发团队发现,由于某些服务的可扩展性要求,这一决策将需要比可用的更多的进程内存。通过与开发团队密切合作,架构师可以迅速意识到这个问题,并调整架构决策以适应这些情况。
Groundhog Day Anti-Pattern 土拨鼠日反模式
Once an architect overcomes the Covering Your Assets anti-pattern and starts making decisions, a second anti-pattern emerges: the Groundhog Day anti-pattern. The Groundhog Day anti-pattern occurs when people don’t know why a decision was made, so it keeps getting discussed over and over and over. The Groundhog Day antipattern gets it name from the Bill Murray movie Groundhog Day, where it was February 2 over and over every day. 一旦架构师克服了“保护你的资产”反模式并开始做出决策,第二个反模式就出现了:土拨鼠日反模式。土拨鼠日反模式发生在当人们不知道为什么做出某个决策时,因此这个决策不断被讨论。土拨鼠日反模式的名称来源于比尔·默瑞的电影《土拨鼠日》,在电影中,每天都是 2 月 2 日。
The Groundhog Day anti-pattern occurs because once an architect makes an architecture decision, they fail to provide a justification for the decision (or a complete justification). When justifying architecture decisions it is important to provide both technical and business justifications for your decision. For example, an architect may 土拨鼠日反模式发生的原因是,一旦架构师做出架构决策,他们未能为该决策提供理由(或完整的理由)。在为架构决策提供理由时,重要的是要同时提供技术和业务上的理由。例如,架构师可能会
make the decision to break apart a monolithic application into separate services to decouple the functional aspects of the application so that each part of the application uses fewer virtual machine resources and can be maintained and deployed separately. While this is a good example of a technical justification, what is missing is the business justification-in other words, why should the business pay for this architectural refactoring? A good business justification for this decision might be to deliver new business functionality faster, therefore improving time to market. Another might be to reduce the costs associated with the development and release of new features. 做出将单体应用拆分为独立服务的决定,以解耦应用的功能方面,使得应用的每个部分使用更少的虚拟机资源,并且可以单独维护和部署。虽然这是一个很好的技术理由,但缺少的是商业理由——换句话说,为什么企业应该为这种架构重构付费?这个决定的一个良好商业理由可能是更快地交付新的业务功能,从而提高上市时间。另一个可能是减少与开发和发布新功能相关的成本。
Providing the business value when justifying decisions is vitally important for any architecture decision. It is also a good litmus test for determining whether the architecture decision should be made in the first place. If a particular architecture decision does not provide any business value, then perhaps it is not a good decision and should be reconsidered. 在为架构决策辩护时,提供商业价值对任何架构决策都是至关重要的。这也是判断是否应该做出架构决策的一个良好试金石。如果某个特定的架构决策没有提供任何商业价值,那么也许这不是一个好的决策,应该重新考虑。
Four of the most common business justifications include cost, time to market, user satisfaction, and strategic positioning. When focusing on these common business justifications, it is important to take into consideration what is important to the business stakeholders. Justifying a particular decision based on cost savings alone might not be the right decision if the business stakeholders are less concerned about cost and more concerned about time to market. 四个最常见的商业理由包括成本、市场时间、用户满意度和战略定位。在关注这些常见的商业理由时,重要的是要考虑业务利益相关者所重视的内容。仅仅基于成本节省来证明某个特定决策可能不是正确的决策,如果业务利益相关者对成本的关注较少,而对市场时间的关注较多。
Once an architect makes decisions and fully justifies those decisions, a third architecture anti-pattern emerges: Email-Driven Architecture. The Email-Driven Architecture anti-pattern is where people lose, forget, or don’t even know an architecture decision has been made and therefore cannot possibly implement that architecture decision. This anti-pattern is all about effectively communicating your architecture decisions. Email is a great tool for communication, but it makes a poor document repository system. 一旦架构师做出决策并充分证明这些决策,就会出现第三种架构反模式:Email-Driven Architecture。Email-Driven Architecture 反模式是指人们失去、忘记或甚至不知道已经做出了架构决策,因此无法实施该架构决策。这个反模式完全是关于有效沟通你的架构决策。电子邮件是一个很好的沟通工具,但它并不是一个好的文档存储系统。
There are many ways to increase the effectiveness of communicating architecture decisions, thereby avoiding the Email-Driven Architecture anti-pattern. The first rule of communicating architecture decisions is to not include the architecture decision in the body of an email. Including the architecture decision in the body of the email creates multiple systems of record for that decision. Many times important details (including the justification) are left out of the email, therefore creating the Groundhog Day anti-pattern all over again. Also, if that architecture decision is ever changed or superseded, how may people received the revised decision? A better approach is to mention only the nature and context of the decision in the body of the email and provide a link to the single system of record for the actual architecture decision and corresponding details (whether it be a link to a wiki page or a document in a filesystem). 有很多方法可以提高架构决策沟通的有效性,从而避免电子邮件驱动架构的反模式。沟通架构决策的第一条规则是不要在电子邮件正文中包含架构决策。在电子邮件正文中包含架构决策会为该决策创建多个记录系统。很多时候,重要的细节(包括理由)会被遗漏在电子邮件中,从而再次造成土拨鼠日反模式。此外,如果该架构决策被更改或取代,有多少人收到了修订后的决策?更好的方法是在电子邮件正文中仅提及决策的性质和背景,并提供一个链接,指向实际架构决策及相应细节的单一记录系统(无论是指向维基页面还是文件系统中的文档)。
The second rule of effectively communicating architecture decisions is to only notify those people who really care about the architecture decision. One effective technique is to write the body of the email as follows: 有效沟通架构决策的第二条规则是仅通知那些真正关心架构决策的人。一种有效的技巧是将电子邮件的正文写成如下:
“Hi Sandra, I’ve made an important decision regarding communication between services that directly impacts you. Please see the decision using the following link…” “嗨,桑德拉,我做出了一个重要决定,关于服务之间的通信,这直接影响到你。请通过以下链接查看该决定…”
Notice the phrasing in the first sentence: “important decision regarding communication between services.” Here, the context of the decision is mentioned, but not the actual decision itself. The second part of the first sentence is even more important: “that directly impacts you.” If an architectural decision doesn’t directly impact the person, then why bother that person with your architecture decision? This is a great litmus test for determining which stakeholders (including developers) should be notified directly of an architecture decision. The second sentence provides a link to the location of the architecture decision so it is located in only one place, hence a single system of record for the decision. 注意第一句中的措辞:“关于服务之间通信的重要决策。”在这里,提到了决策的背景,但没有提到实际的决策。第一句的第二部分更为重要:“直接影响到你。”如果一个架构决策没有直接影响到某个人,那么为什么要让这个人关心你的架构决策呢?这是一个很好的试金石,用于确定哪些利益相关者(包括开发人员)应该直接被通知架构决策。第二句提供了架构决策位置的链接,因此它只位于一个地方,从而形成了决策的单一记录系统。
Architecturally Significant 架构上重要
Many architects believe that if the architecture decision involves any specific technology, then it’s not an architecture decision, but rather a technical decision. This is not always true. If an architect makes a decision to use a particular technology because it directly supports a particular architecture characteristic (such as performance or scalability), then it’s an architecture decision. 许多架构师认为,如果架构决策涉及任何特定技术,那么这不是架构决策,而是技术决策。这并不总是正确的。如果架构师决定使用某种特定技术,因为它直接支持某个特定的架构特性(例如性能或可扩展性),那么这就是一个架构决策。
Michael Nygard, a well-known software architect and author of Release It! (Pragmatic Bookshelf), addressed the problem of what decisions an architect should be responsible for (and hence what is an architecture decision) by coining the term architecturally significant. According to Michael, architecturally significant decisions are those decisions that affect the structure, nonfunctional characteristics, dependencies, interfaces, or construction techniques. 迈克尔·尼加德(Michael Nygard),一位知名的软件架构师和《Release It!》(Pragmatic Bookshelf)的作者,提出了架构师应负责哪些决策(因此什么是架构决策)的问题,创造了“架构上重要”的术语。根据迈克尔的说法,架构上重要的决策是那些影响结构、非功能特性、依赖关系、接口或构建技术的决策。
The structure refers to decisions that impact the patterns or styles of architecture being used. An example of this is the decision to share data between a set of microservices. This decision impacts the bounded context of the microservice, and as such affects the structure of the application. 结构指的是影响所使用的架构模式或风格的决策。一个例子是决定在一组微服务之间共享数据。这个决策影响微服务的边界上下文,因此影响应用程序的结构。
The nonfunctional characteristics are the architecture characteristics ("-ilities") that are important for the application or system being developed or maintained. If a choice of technology impacts performance, and performance is an important aspect of the application, then it becomes an architecture decision. 非功能特性是架构特性(“-ilities”),对于正在开发或维护的应用程序或系统来说非常重要。如果技术选择影响性能,而性能是应用程序的重要方面,那么这就成为一个架构决策。
Dependencies refer to coupling points between components and/or services within the system, which in turn impact overall scalability, modularity, agility, testability, reliability, and so on. 依赖关系是指系统内组件和/或服务之间的耦合点,这反过来会影响整体的可扩展性、模块化、灵活性、可测试性、可靠性等。
Interfaces refer to how services and components are accessed and orchestrated, usually through a gateway, integration hub, service bus, or API proxy. Interfaces usually involve defining contracts, including the versioning and deprecation strategy of those contracts. Interfaces impact others using the system and hence are architecturally significant. 接口是指如何访问和协调服务与组件,通常通过网关、集成中心、服务总线或 API 代理进行。接口通常涉及定义合同,包括这些合同的版本控制和弃用策略。接口影响使用该系统的其他人,因此在架构上具有重要意义。
Finally, construction techniques refer to decisions about platforms, frameworks, tools, and even processes that, although technical in nature, might impact some aspect of the architecture. 最后,构建技术指的是关于平台、框架、工具甚至流程的决策,尽管这些决策在技术上具有性质,但可能会影响架构的某些方面。
Architecture Decision Records 架构决策记录
One of the most effective ways of documenting architecture decisions is through Architecture Decision Records (ADRs). ADRs were first evangelized by Michael Nygard in a blog post and later marked as “adopt” in the ThoughtWorks Technology Radar. An ADR consists of a short text file (usually one to two pages long) describing a specific architecture decision. While ADRs can be written using plain text, they are usually written in some sort of text document format like AsciiDoc or Markdown. Alternatively, an ADR can also be written using a wiki page template. 记录架构决策最有效的方法之一是通过架构决策记录(ADRs)。ADRs 最初由 Michael Nygard 在一篇博客文章中推广,后来在 ThoughtWorks 技术雷达中被标记为“采用”。一个 ADR 由一个简短的文本文件(通常一到两页长)组成,描述一个特定的架构决策。虽然 ADRs 可以使用纯文本编写,但通常使用某种文本文档格式,如 AsciiDoc 或 Markdown。或者,ADR 也可以使用维基页面模板编写。
Tooling is also available for managing ADRs. Nat Pryce, coauthor of Growing ObjectOriented Software Guided by Tests (Addison-Wesley), has written an open source tool for ADRs called ADR-tools. ADR-tools provides a command-line interface to manage ADRs, including the numbering schemes, locations, and superseded logic. Micha Kops, a software engineer from Germany, has written a blog post about using ADRtools that provides some great examples on how they can be used to manage architecture decision records. 工具也可用于管理 ADR。Nat Pryce,《Growing Object-Oriented Software Guided by Tests》(Addison-Wesley)的合著者,编写了一个名为 ADR-tools 的开源工具用于 ADR。ADR-tools 提供了一个命令行接口来管理 ADR,包括编号方案、位置和替代逻辑。来自德国的软件工程师 Micha Kops 撰写了一篇关于使用 ADR-tools 的博客文章,提供了一些关于如何使用它们来管理架构决策记录的优秀示例。
Basic Structure 基本结构
The basic structure of an ADR consists of five main sections: Title, Status, Context, Decision, and Consequences. We usually add two additional sections as part of the basic structure: Compliance and Notes. This basic structure (as illustrated in Figure 19-1) can be extended to include any other section deemed needed, providing the template is kept both consistent and concise. A good example of this might be to add an Alternatives section if necessary to provide an analysis of all the other possible alternative solutions. ADR 的基本结构由五个主要部分组成:标题、状态、背景、决策和后果。我们通常会在基本结构中添加两个额外的部分:合规性和备注。这个基本结构(如图 19-1 所示)可以扩展以包括任何其他认为需要的部分,前提是模板保持一致且简洁。一个好的例子可能是在必要时添加一个替代方案部分,以提供对所有其他可能替代解决方案的分析。
The title of an ADR is usually numbered sequentially and contains a short phrase describing the architecture decisions. For example, the decision to use asynchronous messaging between the Order Service and the Payment Service might read: “42. Use of Asynchronous Messaging Between Order and Payment Services.” The title should be descriptive enough to remove any ambiguity about the nature and context of the decision but at the same time be short and concise. ADR 的标题通常按顺序编号,并包含一个简短的短语来描述架构决策。例如,决定在订单服务和支付服务之间使用异步消息传递的标题可能是:“42. 在订单和支付服务之间使用异步消息传递。”标题应该足够描述性,以消除对决策性质和背景的任何模糊性,同时又要简短明了。
Status 状态
The status of an ADR can be marked as Proposed, Accepted, or Superseded. Proposed status means the decision must be approved by either a higher-level decision maker or some sort of architectural governance body (such as an architecture review board). Accepted status means the decision has been approved and is ready for implementation. A status of Superseded means the decision has been changed and superseded by another ADR. Superseded status always assumes the prior ADR status was accepted; in other words, a proposed ADR would never be superseded by another ADR, but rather continued to be modified until accepted. ADR 的状态可以标记为提议、接受或被取代。提议状态意味着该决策必须得到更高级别决策者或某种架构治理机构(例如架构评审委员会)的批准。接受状态意味着该决策已被批准并准备实施。被取代状态意味着该决策已被更改并被另一个 ADR 取代。被取代状态总是假设之前的 ADR 状态是接受的;换句话说,提议的 ADR 永远不会被另一个 ADR 取代,而是继续修改直到被接受。
The Superseded status is a powerful way of keeping a historical record of what decisions were made, why they were made at that time, and what the new decision is and why it was changed. Usually, when an ADR has been superseded, it is marked with the decision that superseded it. Similarly, the decision that supersedes another ADR is marked with the ADR it superseded. For example, assume ADR 42 (“Use of Asynchronous Messaging Between Order and Payment Services”) was previously approved, but due to later changes to the implementation and location of the Payment Service, REST must now be used between the two services (ADR 68). The status would look as follows: 被取代状态是一种强有力的方式,用于保持历史记录,记录当时做出哪些决策、为什么做出这些决策以及新的决策是什么以及为什么发生了变化。通常,当一个 ADR 被取代时,它会标记为取代它的决策。同样,取代另一个 ADR 的决策也会标记为它所取代的 ADR。例如,假设 ADR 42(“在订单和支付服务之间使用异步消息传递”)之前已被批准,但由于后来的实施和支付服务位置的变化,现在必须在这两个服务之间使用 REST(ADR 68)。状态将如下所示:
ADR 42. Use of Asynchronous Messaging Between Order and Payment Services ADR 42. 在订单和支付服务之间使用异步消息传递
Status: Superseded by 68 状态:被 68 取代
ADR 68. Use of REST Between Order and Payment Services ADR 68. 在订单和支付服务之间使用 REST
Status: Accepted, supersedes 42 状态:已接受,取代 42
The link and history trail between ADRs 42 and 68 avoid the inevitable “what about using messaging?” question regarding ADR 68. ADRs 42 和 68 之间的链接和历史轨迹避免了关于 ADR 68 的不可避免的“那使用消息传递怎么样?”的问题。
ADRs and Request for Comments (RFC) ADRs 和请求评论 (RFC)
If an architect wishes to send out a draft ADR for comments (which is sometimes a good idea when the architect wants to validate various assumptions and assertions with a larger audience of stakeholders), we recommend creating a new status named Request for Comments (or RFC) and specify a deadline date when that review would be complete. This practice avoids the inevitable Analysis Paralysis anti-pattern where the decision is forever discussed but never actually made. Once that date is reached, the architect can analyze all the comments made on the ADR, make any necessary adjustments to the decision, make the final decision, and set the status to Proposed (unless the architect is able to approve the decision themselves, in which case the status would then be set to Accepted). An example of an RFC status for an ADR would look as follows: 如果架构师希望发送草稿 ADR 以征求意见(当架构师希望与更大范围的利益相关者验证各种假设和断言时,这有时是个好主意),我们建议创建一个新的状态,命名为请求评论(Request for Comments,或 RFC),并指定一个截止日期,届时审查将完成。这种做法避免了不可避免的分析瘫痪反模式,即决策永远在讨论中但从未真正做出。一旦达到该日期,架构师可以分析对 ADR 所做的所有评论,进行必要的调整,做出最终决策,并将状态设置为提议(Proposed)(除非架构师能够自己批准该决策,在这种情况下状态将设置为接受(Accepted))。ADR 的 RFC 状态示例如下:
STATUS 状态
Request For Comments, Deadline 09 JAN 2010 请求评论,截止日期 2010 年 1 月 9 日
Another significant aspect of the Status section of an ADR is that it forces an architect to have necessary conversations with their boss or lead architect about the criteria with which they can approve an architecture decision on their own, or whether it must be approved through a higher-level architect, an architecture review board, or some other architecture governing body. ADR 的状态部分的另一个重要方面是,它迫使架构师与他们的老板或首席架构师进行必要的对话,讨论他们可以独立批准架构决策的标准,或者是否必须通过更高级别的架构师、架构评审委员会或其他架构管理机构进行批准。
Three criteria that form a good start for these conversations are cost, cross-team impact, and security. Cost can include software purchase or licensing fees, additional hardware costs, as well as the overall level of effort to implement the architecture 这三项标准为这些对话提供了良好的起点:成本、跨团队影响和安全性。成本可以包括软件购买或许可费用、额外的硬件成本,以及实施架构的整体工作量。
decision. Level of effort costs can be estimated by multiplying the estimated number of hours to implement the architecture decision by the company’s standard Full-Time Equivalency (FTE) rate. The project owner or project manager usually has the FTE amount. If the cost of the architecture decision exceeds a certain amount, then it must be set to Proposed status and approved by someone else. If the architecture decision impacts other teams or systems or has any sort of security implication, then it cannot be self-approved by the architect and must be approved by a higher-level governing body or lead architect. 决策。努力成本可以通过将实施架构决策的预计小时数乘以公司的标准全职等效(FTE)费率来估算。项目所有者或项目经理通常掌握 FTE 金额。如果架构决策的成本超过某个金额,则必须将其设置为提议状态,并由其他人批准。如果架构决策影响其他团队或系统,或有任何安全隐患,则架构师不能自行批准,必须由更高级别的管理机构或首席架构师批准。
Once the criteria and corresponding limits have been established and agreed upon (such as “costs exceeding €5,000€ 5,000 must be approved by the architecture review board”), this criteria should be well documented so that all architects creating ADRs know when they can and cannot approve their own architecture decisions. 一旦标准和相应的限制被建立并达成一致(例如“超过 €5,000€ 5,000 的成本必须得到架构评审委员会的批准”),这些标准应该被很好地记录,以便所有创建 ADR 的架构师知道何时可以和不可以批准自己的架构决策。
Context 上下文
The context section of an ADR specifies the forces at play. In other words, “what situation is forcing me to make this decision?” This section of the ADR allows the architect to describe the specific situation or issue and concisely elaborate on the possible alternatives. If an architect is required to document the analysis of each alternative in detail, then an additional Alternatives section can be added to the ADR rather than adding that analysis to the Context section. ADR 的上下文部分指定了正在发挥作用的力量。换句话说,“是什么情况迫使我做出这个决定?”ADR 的这一部分允许架构师描述具体的情况或问题,并简明扼要地阐述可能的替代方案。如果要求架构师详细记录每个替代方案的分析,则可以在 ADR 中添加一个额外的替代方案部分,而不是将该分析添加到上下文部分。
The Context section also provides a way to document the architecture. By describing the context, the architect is also describing the architecture. This is an effective way of documenting a specific area of the architecture in a clear and concise manner. Continuing with the example from the prior section, the context might read as follows: “The order service must pass information to the payment service to pay for an order currently being placed. This could be done using REST or asynchronous messaging.” Notice that this concise statement not only specified the scenario, but also the alternatives. 上下文部分还提供了一种记录架构的方法。通过描述上下文,架构师也在描述架构。这是一种以清晰简洁的方式记录架构特定领域的有效方法。继续之前部分的示例,上下文可能如下所示:“订单服务必须将信息传递给支付服务,以支付当前正在下的订单。这可以通过 REST 或异步消息传递来完成。”请注意,这个简洁的陈述不仅指定了场景,还列出了替代方案。
Decision 决策
The Decision section of the ADR contains the architecture decision, along with a full justification for the decision. Michael Nygard introduced a great way of stating an architecture decision by using a very affirmative, commanding voice rather than a passive one. For example, the decision to use asynchronous messaging between services would read “we will use asynchronous messaging between services.” This is a much better way of stating a decision as opposed to “I think asynchronous messaging between services would be the best choice.” Notice here it is not clear what the decision is or even if a decision has even been made-only the opinion of the architect is stated. ADR 的决策部分包含架构决策及其完整的理由。Michael Nygard 提出了一种很好的表述架构决策的方法,即使用非常肯定、命令式的语气,而不是被动的语气。例如,决定在服务之间使用异步消息传递可以表述为“我们将使用服务之间的异步消息传递。”这比“我认为服务之间的异步消息传递是最佳选择”要好得多。请注意,这里并不清楚决策是什么,甚至是否做出了决策——仅仅是架构师的意见被陈述。
Perhaps one of the most powerful aspects of the Decision section of ADRs is that it allows an architect to place more emphasis on the why rather than the how. Understanding why a decision was made is far more important than understanding how something works. Most architects and developers can identify how things work by looking at context diagrams, but not why a decision was made. Knowing why a decision was made and the corresponding justification for the decision helps people better understand the context of the problem and avoids possible mistakes through refactoring to another solution that might produce issues. 也许 ADR 决策部分最强大的一个方面是,它允许架构师更强调“为什么”而不是“如何”。理解一个决策为何做出比理解某个事物如何运作要重要得多。大多数架构师和开发人员可以通过查看上下文图来识别事物是如何运作的,但却无法理解为何做出某个决策。知道为何做出某个决策及其相应的理由有助于人们更好地理解问题的背景,并避免通过重构到可能产生问题的其他解决方案而导致的错误。
To illustrate this point, consider an original architecture decision several years ago to use Google’s Remote Procedure Call (gRPC) as a means to communicate between two services. Without understanding why that decision was made, another architect several years later makes the choice to override that decision and use messaging instead to better decouple the services. However, implementing this refactoring suddenly causes a significant increase in latency, which in turn ultimately causes time outs to occur in upstream systems. Understanding that the original use of gRPC was to significantly reduce latency (at the cost of tightly coupled services) would have prevented the refactoring from happening in the first place. 为了说明这一点,考虑几年前做出的一个原始架构决策,即使用 Google 的远程过程调用(gRPC)作为两个服务之间通信的手段。如果不理解这个决策的原因,几年后另一位架构师选择推翻这个决策,而是使用消息传递来更好地解耦服务。然而,实施这个重构突然导致延迟显著增加,进而最终导致上游系统发生超时。理解原本使用 gRPC 是为了显著减少延迟(以紧密耦合的服务为代价)本可以防止重构的发生。
Consequences 后果
The Consequences section of an ADR is another very powerful section. This section documents the overall impact of an architecture decision. Every architecture decision an architect makes has some sort of impact, both good and bad. Having to specify the impact of an architecture decision forces the architect to think about whether those impacts outweigh the benefits of the decision. ADR 的后果部分是另一个非常强大的部分。该部分记录了架构决策的整体影响。架构师所做的每一个架构决策都有某种影响,无论是好是坏。必须指定架构决策的影响迫使架构师思考这些影响是否超过了决策的好处。
Another good use of this section is to document the trade-off analysis associated with the architecture decision. These trade-offs could be cost-based or trade-offs against other architecture characteristics ("-ilities"). For example, consider the decision to use asynchronous (fire-and-forget) messaging to post a review on a website. The justification for this decision is to significantly increase the responsiveness of the post review request from 3,100 milliseconds to 25 milliseconds because users would not need to wait for the actual review to be posted (only for the message to be sent to a queue). While this is a good justification, someone else might argue that this is a bad idea due to the complexity of the error handling associated with an asynchronous request (“what happens if someone posts a review with some bad words?”). Unknown to the person challenging this decision, that issue was already discussed with the business stakeholders and other architects, and it was decided from a trade-off perspective that it was more important to have the increase in responsiveness and deal with the complex error handling rather than have the wait time to synchronously provide feedback to the user that the review was successfully posted. By leveraging ADRs, that trade-off analysis can be included in the Consequences section, providing a complete 本节的另一个好用法是记录与架构决策相关的权衡分析。这些权衡可能是基于成本的,或者是与其他架构特性(“-ilities”)的权衡。例如,考虑使用异步(火并忘)消息在网站上发布评论的决策。做出这个决策的理由是显著提高发布评论请求的响应速度,从 3100 毫秒减少到 25 毫秒,因为用户不需要等待实际评论的发布(只需等待消息发送到队列)。虽然这是一个很好的理由,但其他人可能会争辩说,由于与异步请求相关的错误处理复杂性,这个想法并不好(“如果有人发布了带有不当词汇的评论,会发生什么?”)。 对提出这一决定挑战的人来说,未知的是,这个问题已经与业务利益相关者和其他架构师讨论过,并且从权衡的角度决定,提高响应速度和处理复杂的错误处理更为重要,而不是为了同步向用户提供反馈而增加等待时间,告知评论已成功发布。通过利用 ADRs,这种权衡分析可以包含在后果部分,从而提供一个完整的
picture of the context (and trade-offs) of the architecture decision and thus avoiding these situations. 架构决策的背景(和权衡)的图像,从而避免这些情况。
Compliance 合规性
The compliance section of an ADR is not one of the standard sections in an ADR, but it’s one we highly recommend adding. The Compliance section forces the architect to think about how the architecture decision will be measured and governed from a compliance perspective. The architect must decide whether the compliance check for this decision must be manual or if it can be automated using a fitness function. If it can be automated using a fitness function, the architect can then specify in this section how that fitness function would be written and whether there are any other changes to the code base are needed to measure this architecture decision for compliance. ADR 的合规性部分并不是 ADR 中的标准部分,但我们强烈建议添加这一部分。合规性部分迫使架构师思考如何从合规的角度来衡量和管理架构决策。架构师必须决定该决策的合规检查是必须手动进行,还是可以使用适应度函数进行自动化。如果可以使用适应度函数进行自动化,架构师可以在这一部分中指定该适应度函数的编写方式,以及是否需要对代码库进行其他更改以衡量该架构决策的合规性。
For example, consider the following architecture decision within a traditional n tiered layered architecture as illustrated in Figure 19-2. All shared objects used by business objects in the business layer will reside in the shared services layer to isolate and contain shared functionality. 例如,考虑在传统的 n 层分层架构中做出的以下架构决策,如图 19-2 所示。业务层中的业务对象使用的所有共享对象将驻留在共享服务层,以隔离和包含共享功能。
Figure 19-2. An example of an architecture decision 图 19-2. 架构决策的示例
This architecture decision can be measured and governed automatically by using either ArchUnit in Java or NetArchTest in C#. For example, using ArchUnit in Java, the automated fitness function test might look as follows: 此架构决策可以通过使用 Java 中的 ArchUnit 或 C# 中的 NetArchTest 自动测量和管理。例如,使用 Java 中的 ArchUnit,自动化的适应性函数测试可能如下所示:
@Test
public void shared_services_should_reside_in_services_layer() {
classes().that().areAnnotatedWith(SharedService.class)
.should().resideInAPackage("..services..")
.because("All shared services classes used by business " +
"objects in the business layer should reside in the services " +
"layer to isolate and contain shared logic")
.check(myClasses);
}
Notice that this automated fitness function would require new stories to be written to create a new Java annotation (@SharedService) and to then add this annotation to all shared classes. This section also specifies what the test is, where the test can be found, and how the test will be executed and when. 请注意,这个自动化的适应性函数需要编写新的故事来创建一个新的 Java 注解(@SharedService),然后将这个注解添加到所有共享类中。本节还指定了测试是什么,测试可以在哪里找到,以及测试将如何执行和何时执行。
Notes 笔记
Another section that is not part of a standard ADR but that we highly recommend adding is the Notes section. This section includes various metadata about the ADR, such as the following: 另一个不是标准 ADR 的一部分但我们强烈建议添加的部分是备注部分。该部分包括有关 ADR 的各种元数据,例如以下内容:
Original author 原作者
Approval date 批准日期
Approved by 批准通过
Superseded date 取代日期
Last modified date 最后修改日期
Modified by 修改者
Last modification 最后修改
Even when storing ADRs in a version control system (such as Git), additional metainformation is useful beyond what the repository can support, so we recommend adding this section regardless of how and where ADRs are stored. 即使在版本控制系统(如 Git)中存储 ADR 时,除了仓库所能支持的信息外,额外的元信息也是有用的,因此我们建议无论 ADR 存储在哪里,都添加此部分。
Storing ADRs 存储 ADRs
Once an architect creates an ADR, it must be stored somewhere. Regardless of where ADRs are stored, each architecture decision should have its own file or wiki page. Some architects like to keep ADRs in the Git repository with the source code. Keeping ADRs in a Git repository allows the ADR to be versioned and tracked as well. However, for larger organizations we caution against this practice for several reasons. First, everyone who needs to see the architecture decision may not have access to the Git repository. Second, this is not a good place to store ADRs that have a context outside of the application Git repository (such as integration architecture decisions, enterprise architecture decisions, or those decisions common to every application). 一旦架构师创建了 ADR,它必须存储在某个地方。无论 ADR 存储在哪里,每个架构决策都应该有自己的文件或维基页面。一些架构师喜欢将 ADR 保存在与源代码一起的 Git 仓库中。在 Git 仓库中保留 ADR 可以使 ADR 进行版本控制和跟踪。然而,对于较大的组织,我们出于几个原因对这种做法表示谨慎。首先,并不是所有需要查看架构决策的人都可以访问 Git 仓库。其次,这不是存储具有应用程序 Git 仓库之外上下文的 ADR 的好地方(例如集成架构决策、企业架构决策或适用于每个应用程序的那些决策)。
For these reasons we recommend storing ADRs either in a wiki (using a wiki template) or in a shared directory on a shared file server that can be accessed easily by a wiki or other document rendering software. Figure 19-3 shows an example of what this directory structure (or wiki page navigation structure) might look like. 出于这些原因,我们建议将 ADR 存储在维基中(使用维基模板)或在可以被维基或其他文档渲染软件轻松访问的共享文件服务器上的共享目录中。图 19-3 显示了该目录结构(或维基页面导航结构)可能的样子。
Figure 19-3. Example directory structure for storing ADRs 图 19-3. 存储 ADRs 的示例目录结构
The application directory contains those architecture decisions that are specific to some sort of application context. This directory is subdivided into further directories. The common subdirectory is for architecture decisions that apply to all applications, such as “All framework-related classes will contain an annotation (@Framework in Java) or attribute ([Framework] in C#) identifying the class as belonging to the underlying framework code.” Subdirectories under the application directory correspond to the specific application or system context and contain the architecture decisions specific to that application or system (in this example, the ATP and PSTD applications). The integration directory contains those ADRs that involve the communication between application, systems, or services. Enterprise architecture ADRs are contained within the enterprise directory, indicating that these are global architecture decisions impacting all systems and applications. An example of an enterprise architecture ADR would be “All access to a system database will only be from the owning system,” thus preventing the sharing of databases across multiple systems. 应用程序目录包含特定于某种应用程序上下文的架构决策。该目录进一步细分为多个子目录。公共子目录用于适用于所有应用程序的架构决策,例如“所有与框架相关的类将包含一个注释(在 Java 中为@Framework)或属性(在 C#中为[Framework]),以标识该类属于基础框架代码。” 应用程序目录下的子目录对应于特定的应用程序或系统上下文,并包含该应用程序或系统特定的架构决策(在此示例中为 ATP 和 PSTD 应用程序)。集成目录包含涉及应用程序、系统或服务之间通信的 ADR。企业架构 ADR 包含在企业目录中,表明这些是影响所有系统和应用程序的全球架构决策。企业架构 ADR 的一个例子是“对系统数据库的所有访问仅来自拥有该系统”,从而防止跨多个系统共享数据库。
When storing ADRs in a wiki (our recommendation), the same structure previously described applies, with each directory structure representing a navigational landing page. Each ADR would be represented as a single wiki page within each navigational landing page (Application, Integration, or Enterprise). 在维基中存储 ADR(我们的推荐)时,之前描述的相同结构适用,每个目录结构代表一个导航着陆页。每个 ADR 将在每个导航着陆页(应用、集成或企业)中表示为一个单独的维基页面。
The directory or landing page names indicated in this section are only a recommendation. Each company can choose whatever names fit their situation, as long as those names are consistent across teams. 本节中指示的目录或着陆页名称仅为建议。每个公司可以选择适合其情况的名称,只要这些名称在各个团队之间保持一致。
ADRs as Documentation ADRs 作为文档
Documenting software architecture has always been a difficult topic. While some standards are emerging for diagramming architecture (such as software architect Simon Brown’s C4 Model or The Open Group ArchiMate standard), no such standard exists for documenting software architecture. That’s where ADRs come in. 记录软件架构一直是一个困难的话题。虽然一些标准正在出现用于架构图示(例如软件架构师 Simon Brown 的 C4 模型或开放组 ArchiMate 标准),但目前没有针对软件架构文档的标准。这就是 ADRs 发挥作用的地方。
Architecture Decision Records can be used an an effective means to document a software architecture. The Context section of an ADR provides an excellent opportunity to describe the specific area of the system that requires an architecture decision to be made. This section also provides an opportunity to describe the alternatives. Perhaps more important is that the Decision section describes the reasons why a particular decision is made, which is by far the best form of architecture documentation. The Consequences section adds the final piece to the architecture documentation by describing additional aspects of a particular decision, such as the trade-off analysis of choosing performance over scalability. 架构决策记录可以作为记录软件架构的有效手段。ADR 的上下文部分提供了一个绝佳的机会来描述需要做出架构决策的系统特定领域。该部分还提供了描述替代方案的机会。或许更重要的是,决策部分描述了做出特定决策的原因,这无疑是架构文档的最佳形式。后果部分通过描述特定决策的其他方面,为架构文档增添了最后一块,例如选择性能而非可扩展性的权衡分析。
Using ADRs for Standards 使用 ADRs 进行标准化
Very few people like standards. Most times standards seem to be in place more for controlling people and the way they do things than anything useful. Using ADRs for standards can change this bad practice. For example, the Context section of an ADR describes the situation that is forcing the particular standard. The Decision section of an ADR can be used to not only indicate what the standard is, but more importantly why the standard needs to exist. This is a wonderful way of being able to qualify whether the particular standard should even exist in the first place. If an architect cannot justify the standard, then perhaps it is not a good standard to make and enforce. Furthermore, the more developers understand why a particular standard exists, the more likely they are to follow it (and correspondingly not challenge it). The Consequences section of an ADR is another great place an architect can qualify whether a standard is valid and should be made. In this section the architect must think about and document what the implications and consequences are of a particular standard they are making. By analyzing the consequences, the architect might decide that the standard should not be applied after all. 很少有人喜欢标准。大多数时候,标准似乎更多是为了控制人们及其做事方式,而不是出于任何有用的目的。使用 ADR 作为标准可以改变这种不良做法。例如,ADR 的上下文部分描述了迫使特定标准产生的情况。ADR 的决策部分不仅可以用来指明标准是什么,更重要的是可以说明为什么这个标准需要存在。这是一个很好的方式来判断特定标准是否应该存在。如果一个架构师无法证明这个标准的合理性,那么也许这个标准并不适合制定和执行。此外,开发者越了解特定标准存在的原因,他们就越可能遵循它(相应地也不太会质疑它)。ADR 的后果部分是架构师可以判断标准是否有效并应该制定的另一个好地方。在这一部分,架构师必须考虑并记录他们所制定的特定标准的影响和后果。 通过分析后果,架构师可能会决定最终不应用该标准。
Example 示例
Many architecture decisions exist within our ongoing “Case Study: Going, Going, Gone” on page 95. The use of event-driven microservices, the splitting up of the bidder and auctioneer user interfaces, the use of the Real-time Transport Protocol (RTP) for video capture, the use of a single API layer, and the use of publish-and-subscribe messaging are just a few of the dozens of architecture decisions that are made for this auction system. Every architecture decision made in a system, no matter how obvious, should be documented and justified. 在我们正在进行的“案例研究:进行中、进行中、已结束”中,第 95 页存在许多架构决策。使用事件驱动的微服务、拆分竞标者和拍卖师用户界面、使用实时传输协议(RTP)进行视频捕获、使用单一 API 层以及使用发布-订阅消息传递只是为该拍卖系统做出的数十个架构决策中的几个。系统中做出的每一个架构决策,无论多么明显,都应该被记录和证明。
Figure 19-4 illustrates one of the architecture decisions within the Going, Going, Gone auction system, which is the use of publish-and-subscribe (pub/sub) messaging between the bid capture, bid streamer, and bid tracker services. 图 19-4 展示了在 Going, Going, Gone 拍卖系统中的一个架构决策,即在投标捕获、投标流和投标跟踪服务之间使用发布-订阅(pub/sub)消息传递。
Figure 19-4. Use of pub/sub between services 图 19-4. 服务之间的 pub/sub 使用
The ADR for this architecture decision might look simliar to Figure 19-5: 该架构决策的 ADR 可能类似于图 19-5:
The Bid Capture Service, upon receiving a bid from an online bidder or from a live bidder via the auctioneer, must forward that bid onto the Bid Streamer Service and the Bidder Tracker Service. This could be done using asynchronous point-to-point (p2p) messaging, asynchronous publish-and-subscribe (pub/sub) messaging, or REST via the Online Auction API Layer. 投标捕获服务在接收到来自在线投标者或通过拍卖师的现场投标者的投标后,必须将该投标转发到投标流服务和投标者跟踪服务。这可以通过异步点对点(p2p)消息传递、异步发布-订阅(pub/sub)消息传递或通过在线拍卖 API 层的 REST 来完成。
DECISION 决策
We will use asynchronous pub/sub messaging between the Bid Capture Service, Bid Streamer Service, and the Bidder Tracker Service. 我们将在投标捕获服务、投标流服务和投标者跟踪服务之间使用异步发布/订阅消息传递。
The Bid Capture Service does not need any information back from the Bid Streamer Service or Bidder Tracker Service. 投标捕获服务不需要来自投标流服务或投标者跟踪服务的任何信息。
The Bid Streamer Service must receive bids in the exact order they were accepted by the Bid Capture Service. Using messaging and queues automatically guarantees the bid order for the stream. 投标流服务必须按照投标捕获服务接受投标的确切顺序接收投标。使用消息和队列自动保证了流的投标顺序。
Using async pub/sub messaging will increase the performance of the bidding process and allow for extensibility of bidding information. 使用异步发布/订阅消息将提高竞标过程的性能,并允许竞标信息的可扩展性。
CONSEQUENCES 后果
We will require clustering and high availability of the message queues. 我们将需要消息队列的集群和高可用性。
Internal bid events will be bypassing security checks done in the API layer. 内部投标事件将绕过在 API 层进行的安全检查。
UPDATE: Upon review at the April 14th, 2020 ARB meeting, the ARB decided that this was an acceptable trade-off and no additional security checks would be needed for bid events between these services. 更新:在 2020 年 4 月 14 日的 ARB 会议上,ARB 决定这是一个可接受的权衡,因此在这些服务之间的投标事件中不需要额外的安全检查。
COMPLIANCE 合规性
We will use periodic manual code and design reviews to ensure that asynchronous pub/sub messaging is being used between the Bid Capture Service, Bid Streamer Service, and the Bidder Tracker Service. 我们将使用定期的手动代码和设计审查,以确保在投标捕获服务、投标流服务和投标者跟踪服务之间使用异步发布/订阅消息传递。
NOTES 备注
Author: Subashini Nadella 作者:Subashini Nadella
Approved By: ARB Meeting Members, 14 APRIL 2020 批准人:ARB 会议成员,2020 年 4 月 14 日
Last Updated: 15 APRIL 2020 by Subashini Nadella 最后更新:2020 年 4 月 15 日,Subashini Nadella
Figure 19-5. ADR 76. Asynchronous Pub/Sub Messaging Between Bidding Services 图 19-5. ADR 76. 竞标服务之间的异步发布/订阅消息传递
Analyzing Architecture Risk 分析架构风险
Every architecture has risk associated with it, whether it be risk involving availability, scalability, or data integrity. Analyzing architecture risk is one of the key activities of architecture. By continually analyzing risk, the architect can address deficiencies within the architecture and take corrective action to mitigate the risk. In this chapter we introduce some of the key techniques and practices for qualifying risk, creating risk assessments, and identifying risk through an activity called risk storming. 每种架构都有相关的风险,无论是涉及可用性、可扩展性还是数据完整性的风险。分析架构风险是架构的关键活动之一。通过持续分析风险,架构师可以解决架构中的不足,并采取纠正措施以降低风险。在本章中,我们介绍了一些关键技术和实践,用于评估风险、创建风险评估以及通过一种称为风险风暴的活动识别风险。
Risk Matrix 风险矩阵
The first issue that arises when assessing architecture risk is determining whether the risk should be classified as low, medium, or high. Too much subjectiveness usually enters into this classification, creating confusion about which parts of the architecture are really high risk versus medium risk. Fortunately, there is a risk matrix architects can leverage to help reduce the level of subjectiveness and qualify the risk associated with a particular area of the architecture. 评估架构风险时出现的第一个问题是确定风险应被分类为低、中或高。通常,这种分类中会引入过多的主观性,从而导致对架构中哪些部分是真正的高风险与中风险产生混淆。幸运的是,架构师可以利用风险矩阵来帮助减少主观性,并对架构特定区域相关的风险进行定性。
The architecture risk matrix (illustrated in Figure 20-1) uses two dimensions to qualify risk: the overall impact of the risk and the likelihood of that risk occurring. Each dimensions has a low (1), medium (2), and high (3) rating. These numbers are multiplied together within each grid of the matrix, providing an objective numerical number representing that risk. Numbers 1 and 2 are considered low risk (green), numbers 3 and 4 are considered medium risk (yellow), and numbers 6 through 9 are considered high risk (red). 架构风险矩阵(如图 20-1 所示)使用两个维度来评估风险:风险的整体影响和该风险发生的可能性。每个维度都有低(1)、中(2)和高(3)的评级。这些数字在矩阵的每个网格中相乘,提供一个客观的数字,代表该风险。数字 1 和 2 被视为低风险(绿色),数字 3 和 4 被视为中风险(黄色),数字 6 到 9 被视为高风险(红色)。
Figure 20-1. Matrix for determining architecture risk 图 20-1. 确定架构风险的矩阵
To see how the risk matrix can be used, suppose there is a concern about availability with regard to a primary central database used in the application. First, consider the impact dimension-what is the overall impact if the database goes down or becomes unavailable? Here, an architect might deem that high risk, making that risk either a 3 (medium), 6 (high), or 9 (high). However, after applying the second dimension (likelihood of risk occurring), the architect realizes that the database is on highly available servers in a clustered configuration, so the likelihood is low that the database would become unavailable. Therefore, the intersection between the high impact and low likelihood gives an overall risk rating of 3 (medium risk). 要了解风险矩阵如何使用,假设对应用程序中使用的主要中央数据库的可用性存在担忧。首先,考虑影响维度——如果数据库宕机或变得不可用,整体影响是什么?在这里,架构师可能会认为这是高风险,将该风险评定为 3(中等)、6(高)或 9(高)。然而,在应用第二个维度(风险发生的可能性)后,架构师意识到数据库位于集群配置的高可用服务器上,因此数据库变得不可用的可能性很低。因此,高影响和低可能性之间的交集给出了整体风险评级为 3(中等风险)。
When leveraging the risk matrix to qualify the risk, consider the impact dimension first and the likelihood dimension second. 在利用风险矩阵来评估风险时,首先考虑影响维度,其次考虑可能性维度。
Risk Assessments 风险评估
The risk matrix described in the previous section can be used to build what is called a risk assessment. A risk assessment is a summarized report of the overall risk of an architecture with respect to some sort of contextual and meaningful assessment criteria. 前一节中描述的风险矩阵可以用来构建所谓的风险评估。风险评估是关于架构整体风险的总结报告,涉及某种上下文和有意义的评估标准。
Risk assessments can vary greatly, but in general they contain the risk (qualified from the risk matrix) of some assessment criteria based on services or domain areas of an application. This basic risk assessment report format is illustrated in Figure 20-2, where light gray (1-2) is low risk, medium gray (3-4) is medium risk, and dark gray (6-9) is high risk. Usually these are color-coded as green (low), yellow (medium), and red (high), but shading can be useful for black-and-white rendering and for color blindness. 风险评估可能会有很大差异,但一般来说,它们包含基于应用程序的服务或领域区域的一些评估标准的风险(从风险矩阵中确定)。这种基本的风险评估报告格式在图 20-2 中进行了说明,其中浅灰色(1-2)表示低风险,中灰色(3-4)表示中风险,深灰色(6-9)表示高风险。通常这些用颜色编码为绿色(低)、黄色(中)和红色(高),但阴影在黑白渲染和色盲情况下也很有用。
Figure 20-2. Example of a standard risk assessment 图 20-2. 标准风险评估示例
The quantified risk from the risk matrix can be accumulated by the risk criteria and also by the service or domain area. For example, notice in Figure 20-2 that the accumulated risk for data integrity is the highest risk area at a total of 17 , whereas the accumulated risk for Availability is only 10 (the least amount of risk). The relative risk of each domain area can also be determined by the example risk assessment. Here, customer registration carries the highest area of risk, whereas order fulfillment carries the lowest risk. These relative numbers can then be tracked to demonstrate either improvements or degradation of risk within a particular risk category or domain area. 风险矩阵中的量化风险可以通过风险标准以及服务或领域区域进行累积。例如,在图 20-2 中,数据完整性的累积风险是最高的风险区域,总计为 17,而可用性的累积风险仅为 10(风险最少)。每个领域区域的相对风险也可以通过示例风险评估来确定。在这里,客户注册承载着最高的风险区域,而订单履行则承载着最低的风险。这些相对数字可以被跟踪,以展示特定风险类别或领域区域内风险的改善或恶化。
Although the risk assessment example in Figure 20-2 contains all the risk analysis results, rarely is it presented as such. Filtering is essential for visually indicating a particular message within a given context. For example, suppose an architect is in a meeting for the purpose of presenting areas of the system that are high risk. Rather than presenting the risk assessment as illustrated in Figure 20-2, filtering can be used to only show the high risk areas (shown in Figure 20-3), improving the overall signal-to-noise ratio and presenting a clear picture of the state of the system (good or bad). 尽管图 20-2 中的风险评估示例包含所有风险分析结果,但很少以这种方式呈现。在给定上下文中,过滤对于视觉上指示特定信息至关重要。例如,假设一位架构师在会议上旨在展示系统中高风险的区域。与其呈现如图 20-2 所示的风险评估,不如使用过滤仅显示高风险区域(如图 20-3 所示),从而改善整体信噪比,并清晰地展示系统的状态(好或坏)。
Figure 20-3. Filtering the risk assessment to only high risk 图 20-3. 过滤风险评估以仅显示高风险
Another issue with Figure 20-2 is that this assessment report only shows a snapshot in time; it does not show whether things are improving or getting worse. In other words, Figure 20-2 does not show the direction of risk. Rendering the direction of risk presents somewhat of an issue. If an up or down arrow were to be used to indicate direction, what would an up arrow mean? Are things getting better or worse? We’ve spent years asking people if an up arrow meant things were getting better or worse, and almost 50%50 \% of people asked said that the up arrow meant things were progressively getting worse, whereas almost 50%50 \% said an up arrow indicated things were getting better. The same is true for left and right arrows. For this reason, when using arrows to indicate direction, a key must be used. However, we’ve also found this doesn’t work either. Once the user scrolls beyond the key, confusion happens once again. 另一个关于图 20-2 的问题是,这份评估报告仅显示了某一时刻的快照;它并没有显示事情是改善还是恶化。换句话说,图 20-2 并没有显示风险的方向。呈现风险的方向存在一些问题。如果使用向上或向下的箭头来指示方向,向上箭头意味着什么?事情是在变好还是变坏?我们花了多年时间询问人们向上箭头是否意味着事情在变好,几乎 50%50 \% 的人表示向上箭头意味着事情在逐渐变坏,而几乎 50%50 \% 的人表示向上箭头表示事情在变好。左箭头和右箭头也是如此。因此,在使用箭头指示方向时,必须使用图例。然而,我们也发现这同样无效。一旦用户滚动超过图例,混淆再次发生。
We usually use the universal direction symbol of a plus (+) and minus (-) sign next to the risk rating to indicate direction, as illustrated in Figure 20-4. Notice in Figure 20-4 that although performance for customer registration is medium (4), the direction is a minus sign (red), indicating that it is progressively getting worse and heading toward high risk. On the other hand, notice that scalability of catalog checkout is high (6) with a plus sign (green), showing that it is improving. Risk ratings without a plus or minus sign indicate that the risk is stable and neither getting better nor worse. 我们通常使用加号 (+) 和减号 (-) 的通用方向符号放在风险评级旁边以指示方向,如图 20-4 所示。请注意,在图 20-4 中,尽管客户注册的性能为中等 (4),但方向是减号(红色),这表明它正在逐渐变差,趋向于高风险。另一方面,请注意,目录结账的可扩展性为高 (6),并带有加号(绿色),显示它正在改善。没有加号或减号的风险评级表示风险是稳定的,既没有变好也没有变坏。
Figure 20-4. Showing direction of risk with plus and minus signs 图 20-4. 用加号和减号表示风险的方向
Occasionally, even the plus and minus signs can be confusing to some people. Another technique for indicating direction is to leverage an arrow along with the risk rating number it is trending toward. This technique, as illustrated in Figure 20-5, does not require a key because the direction is clear. Furthermore, the use of colors (red arrow for worse, green arrow for better) makes it even more clear where the risk is heading. 有时,正负号对某些人来说也可能令人困惑。指示方向的另一种技术是利用箭头以及它所趋向的风险评级数字。正如图 20-5 所示,这种技术不需要图例,因为方向是明确的。此外,使用颜色(红色箭头表示更糟,绿色箭头表示更好)使风险的走向更加清晰。
Figure 20-5. Showing direction of risk with arrows and numbers 图 20-5. 用箭头和数字显示风险的方向
The direction of risk can be determined by using continuous measurements through fitness functions described earlier in the book. By objectively analyzing each risk criteria, trends can be observed, providing the direction of each risk criteria. 风险的方向可以通过使用本书前面描述的适应度函数进行持续测量来确定。通过客观分析每个风险标准,可以观察到趋势,从而提供每个风险标准的方向。
Risk Storming 风险风暴
No architect can single-handedly determine the overall risk of a system. The reason for this is two-fold. First, a single architect might miss or overlook a risk area, and very few architects have full knowledge of every part of the system. This is where risk storming can help. 没有任何架构师能够单独确定系统的整体风险。原因有两个。首先,单个架构师可能会错过或忽视某个风险领域,而且很少有架构师对系统的每个部分都有全面的了解。这就是风险风暴可以提供帮助的地方。
Risk storming is a collaborative exercise used to determine architectural risk within a specific dimension. Common dimensions (areas of risk) include unproven technology, performance, scalability, availability (including transitive dependencies), data loss, single points of failure, and security. While most risk storming efforts involve multiple architects, it is wise to include senior developers and tech leads as well. Not only will they provide an implementation perspective to the architectural risk, but involving developers helps them gain a better understanding of the architecture. 风险风暴是一种协作练习,用于确定特定维度内的架构风险。常见的维度(风险领域)包括未验证的技术、性能、可扩展性、可用性(包括传递依赖)、数据丢失、单点故障和安全性。虽然大多数风险风暴工作涉及多个架构师,但明智的做法是也包括高级开发人员和技术负责人。他们不仅会为架构风险提供实施视角,而且让开发人员参与有助于他们更好地理解架构。
The risk storming effort involves both an individual part and a collaborative part. In the individual part, all participants individually (without collaboration) assign risk to areas of the architecture using the risk matrix described in the previous section. This noncollaborative part of risk storming is essential so that participants don’t influence or direct attention away from particular areas of the architecture. In the collaborative part of risk storming, all participants work together to gain consensus on risk areas, discuss risk, and form solutions for mitigating the risk. 风险风暴的工作包括个人部分和协作部分。在个人部分,所有参与者独立(不进行协作)使用前一部分中描述的风险矩阵为架构的各个领域分配风险。这一非协作的风险风暴部分至关重要,以确保参与者不会影响或转移对架构特定领域的关注。在风险风暴的协作部分,所有参与者共同努力达成对风险领域的共识,讨论风险,并形成减轻风险的解决方案。
An architecture diagram is used for both parts of the risk storming effort. For holistic risk assessments, usually a comprehensive architecture diagram is used, whereas risk storming within specific areas of the application would use a contextual architecture diagram. It is the responsibility of the architect conducting the risk storming effort to make sure these diagrams are up to date and available to all participants. 架构图用于风险风暴工作的两个部分。对于整体风险评估,通常使用综合架构图,而在应用程序的特定领域进行风险风暴时则使用上下文架构图。进行风险风暴工作的架构师有责任确保这些图表是最新的,并且对所有参与者可用。
Figure 20-6 shows an example architecture we’ll use to illustrate the risk storming process. In this architecture, an Elastic Load Balancer fronts each EC2 instance containing the web servers (Nginx) and application services. The application services make calls to a MySQL database, a Redis cache, and a MongoDB database for logging. They also make calls to the Push Expansion Servers. The expansion servers, in turn, all interface with the MySQL database, Redis cache, and MongoDB logging facility. 图 20-6 展示了一个示例架构,我们将用它来说明风险风暴过程。在这个架构中,一个弹性负载均衡器位于每个包含 Web 服务器(Nginx)和应用服务的 EC2 实例前面。应用服务调用 MySQL 数据库、Redis 缓存和 MongoDB 数据库进行日志记录。它们还调用推送扩展服务器。扩展服务器则与 MySQL 数据库、Redis 缓存和 MongoDB 日志记录设施进行接口。
Figure 20-6. Architecture diagram for risk storming example 图 20-6. 风险风暴示例的架构图
Risk storming is broken down into three primary activities: 风险风暴分为三个主要活动:
Identification 识别
2. Consensus 2. 共识
Mitigation 缓解
Identification is always an individual, noncollaborative activity, whereas consensus and mitigation are always collaborative and involve all participants working together in the same room (at least virtually). Each of these primary activities is discussed in detail in the following sections. 识别始终是一个个体的、非协作的活动,而共识和缓解始终是协作的,涉及所有参与者在同一个房间(至少是虚拟的)共同工作。以下各节将详细讨论这些主要活动。
Identification 识别
The identification activity of risk storming involves each participant individually identifying areas of risk within the architecture. The following steps describe the identification part of the risk storming effort: 风险风暴的识别活动涉及每个参与者单独识别架构中的风险领域。以下步骤描述了风险风暴工作中的识别部分:
The architect conducting the risk storming sends out an invitation to all participants one to two days prior to the collaborative part of the effort. The invitation contains the architecture diagram (or the location of where to find it), the risk 进行风险风暴的架构师在协作部分开始前一到两天向所有参与者发送邀请。邀请中包含架构图(或找到它的位置)和风险
storming dimension (area of risk being analyzed for that particular risk storming effort), the date when the collaborative part of risk storming will take place, and the location. 风暴维度(针对特定风险风暴努力分析的风险领域)、风险风暴协作部分将进行的日期和地点。
Using the risk matrix described in the first section of this chapter, participants individually analyze the architecture and classify the risk as low (1-2), medium (3-4), or high (6-9). 使用本章第一部分描述的风险矩阵,参与者单独分析架构并将风险分类为低(1-2)、中(3-4)或高(6-9)。
Participants prepare small Post-it notes with corresponding colors (green, yellow, and red) and write down the corresponding risk number (found on the risk matrix). 参与者准备小的便利贴,使用相应的颜色(绿色、黄色和红色),并写下相应的风险编号(在风险矩阵中找到)。
Most risk storming efforts only involve analyzing one particular dimension (such as performance), but there might be times, due to the availability of staff or timing issues, when multiple dimensions are analyzed within a single risk storming effort (such as performance, scalability, and data loss). When multiple dimensions are analyzed within a single risk storming effort, the participants write the dimension next to the risk number on the Post-it notes so that everyone is aware of the specific dimension. For example, suppose three participants found risk within the central database. All three identified the risk as high (6), but one participant found risk with respect to availability, whereas two participants found risk with respect to performance. These two dimensions would be discussed separately. 大多数风险风暴的努力仅涉及分析一个特定的维度(例如性能),但由于人员的可用性或时间问题,有时可能会在一次风险风暴中分析多个维度(例如性能、可扩展性和数据丢失)。当在一次风险风暴中分析多个维度时,参与者会在便签上将维度写在风险编号旁边,以便每个人都能意识到特定的维度。例如,假设三位参与者在中央数据库中发现了风险。所有三位参与者都将风险评估为高(6),但一位参与者发现了与可用性相关的风险,而另外两位参与者发现了与性能相关的风险。这两个维度将分别进行讨论。
Whenever possible, restrict risk storming efforts to a single dimension. This allows participants to focus their attention to that specific dimension and avoids confusion about multiple risk areas being identified for the same area of the architecture. 尽可能将风险风暴的努力限制在单一维度。这使参与者能够将注意力集中在该特定维度上,并避免对同一架构区域识别多个风险领域的混淆。
Consensus 共识
The consensus activity in the risk storming effort is highly collaborative with the goal of gaining consensus among all participants regarding the risk within the architecture. This activity is most effective when a large, printed version of the architecture diagram is available and posted on the wall. In lieu of a large printed version, an electronic version can be displayed on a large screen. 在风险风暴活动中的共识活动是高度协作的,目的是在所有参与者之间就架构中的风险达成共识。当有一个大型的架构图打印版本可用并张贴在墙上时,这项活动最为有效。如果没有大型打印版本,可以在大屏幕上显示电子版本。
Upon arrival at the risk storming session, participants begin placing their Post-it notes on the architecture diagram in the area where they individually found risk. If an electronic version is used, the architect conducting the risk storming session queries every participant and electronically places the risk on the diagram in the area of the architecture where the risk was identified (see Figure 20-7). 在风险风暴会议开始时,参与者开始将他们的便利贴放置在架构图中他们个人发现风险的区域。如果使用电子版本,进行风险风暴会议的架构师会询问每位参与者,并在架构图中电子地将风险放置在识别风险的区域(见图 20-7)。
Figure 20-7. Initial identification of risk areas 图 20-7. 风险区域的初步识别
Once all of the Post-it notes are in place, the collaborative part of risk storming can begin. The goal of this activity of risk storming is to analyze the risk areas as a team and gain consensus in terms of the risk qualification. Notice several areas of risk were identified in the architecture, illustrated in Figure 20-7: 一旦所有的便利贴都到位,风险风暴的协作部分就可以开始了。风险风暴活动的目标是作为一个团队分析风险领域,并在风险资格方面达成共识。注意在架构中识别出几个风险领域,如图 20-7 所示:
Two participants individually identified the Elastic Load Balancer as medium risk (3), whereas one participant identified it as high risk (6). 两位参与者分别将弹性负载均衡器评估为中等风险(3),而一位参与者将其评估为高风险(6)。
One participant individually identified the Push Expansion Servers as high risk (9). 一位参与者单独将推送扩展服务器识别为高风险(9)。
Three participants individually identified the MySQL database as medium risk (3). 三位参与者分别将 MySQL 数据库评估为中等风险(3)。
One participant individually identified the Redis cache as high risk (9). 一位参与者单独将 Redis 缓存识别为高风险(9)。
Three participants identified MongoDB logging as low risk (2). 三名参与者将 MongoDB 日志识别为低风险(2)。
All other areas of the architecture were not deemed to carry any risk, hence there are no Post-it notes on any other areas of the architecture. 架构的其他所有领域都被认为没有任何风险,因此在架构的其他领域没有任何便签。
Items 3 and 5 in the prior list do not need further discussion in this activity since all participants agreed on the level and qualification of risk. However, notice there was a difference of opinion in item 1 in the list, and items 2 and 4 only had a single participant identifying the risk. These items need to be discussed during this activity. 在之前的列表中,第 3 项和第 5 项在本次活动中无需进一步讨论,因为所有参与者都同意风险的级别和资格。然而,请注意,第 1 项存在意见分歧,而第 2 项和第 4 项只有一位参与者识别了风险。这些项目需要在本次活动中讨论。
Item 1 in the list showed that two participants individually identified the Elastic Load Balancer as medium risk (3), whereas one participant identified it as high risk (6). In this case the other two participants ask the third participant why they identified the risk as high. Suppose the third participant says that they assigned the risk as high because if the Elastic Load Balancer goes down, the entire system cannot be accessed. While this is true and in fact does bring the overall impact rating to high, the other two participants convince the third participant that there is low risk of this happening. After much discussion, the third participant agrees, bringing that risk level down to a medium (3). However, the first and second participants might not have seen a particular aspect of risk in the Elastic Load Balancer that the third did, hence the need for collaboration within this activity of risk storming. 列表中的项目 1 显示,两个参与者分别将弹性负载均衡器识别为中等风险(3),而一个参与者将其识别为高风险(6)。在这种情况下,其他两个参与者询问第三个参与者为什么将风险识别为高。假设第三个参与者说他们将风险评定为高是因为如果弹性负载均衡器出现故障,整个系统将无法访问。虽然这确实是事实,并且确实将整体影响评级提高到高,但其他两个参与者说服第三个参与者认为这种情况发生的风险很低。经过多次讨论,第三个参与者同意,将该风险水平降低到中等(3)。然而,第一和第二个参与者可能没有看到弹性负载均衡器中第三个参与者所看到的特定风险方面,因此在风险风暴活动中需要进行协作。
Case in point, consider item 2 in the prior list where one participant individually identified the Push Expansion Servers as high risk (9), whereas no other participant identified them as any risk at all. In this case, all other participants ask the participant who identified the risk why they rated it as high. That participant then says that they have had bad experiences with the Push Expansion Servers continually going down under high load, something this particular architecture has. This example shows the value of risk storming-without that participant’s involvement, no one would have seen the high risk (until well into production of course!). 举个例子,考虑之前列表中的第 2 项,其中一位参与者单独将推送扩展服务器识别为高风险(9),而其他参与者则没有将其识别为任何风险。在这种情况下,所有其他参与者都询问识别风险的参与者为什么将其评为高风险。该参与者随后表示,他们在高负载下推送扩展服务器持续宕机方面有过不好的经历,这正是该特定架构所具有的。这一例子展示了风险风暴的价值——如果没有该参与者的参与,其他人将不会看到高风险(当然,直到生产阶段很久之后!)。
Item 4 in the list is an interesting case. One participant identified the Redis cache as high risk (9), whereas no other participant saw that cache as any risk in the architecture. The other participants ask what the rationale is for the high risk in that area, and the one participant responds with, “What is a Redis cache?” In this case, Redis was unknown to the participant, hence the high risk in that area. 列表中的第 4 项是一个有趣的案例。一位参与者将 Redis 缓存视为高风险(9),而其他参与者则认为该缓存在架构中没有任何风险。其他参与者询问该领域高风险的理由,而那位参与者回答:“什么是 Redis 缓存?”在这种情况下,参与者对 Redis 并不熟悉,因此在该领域存在高风险。
For unproven or unknown technologies, always assign the highest risk rating (9) since the risk matrix cannot be used for this dimension. 对于未经验证或未知的技术,始终分配最高风险评级(9),因为风险矩阵无法用于此维度。
The example of item 4 in the list illustrates why it is wise (and important) to bring developers into risk storming sessions. Not only can developers learn more about the architecture, but the fact that one participant (who was in this case a developer on the team) didn’t know a given technology provides the architect with valuable information regarding overall risk. 列表中第 4 项的例子说明了为什么将开发人员引入风险风暴会议是明智(且重要)的。不仅开发人员可以更多地了解架构,而且其中一位参与者(在这种情况下是团队中的一名开发人员)对某项技术的不了解为架构师提供了有关整体风险的宝贵信息。
This process continues until all participants agree on the risk areas identified. Once all the Post-it notes are consolidated, this activity ends, and the next one can begin. The final outcome of this activity is shown in Figure 20-8. 这个过程持续进行,直到所有参与者对识别出的风险领域达成一致。一旦所有的便利贴被整合,这个活动就结束了,下一项活动可以开始。这个活动的最终结果如图 20-8 所示。
Figure 20-8. Consensus of risk areas 图 20-8. 风险领域的共识
Mitigation 缓解
Once all participants agree on the qualification of the risk areas of the architecture, the final and most important activity occurs-risk mitigation. Mitigating risk within an architecture usually involves changes or enhancements to certain areas of the architecture that otherwise might have been deemed perfect the way they were. 一旦所有参与者就架构风险领域的资格达成一致,最后也是最重要的活动就会发生——风险缓解。在架构中缓解风险通常涉及对架构某些区域的更改或增强,而这些区域在原本的状态下可能被认为是完美的。
This activity, which is also usually collaborative, seeks ways to reduce or eliminate the risk identified in the first activity. There may be cases where the original architecture needs to be completely changed based on the identification of risk, whereas others might be a straightforward architecture refactoring, such as adding a queue for back pressure to reduce a throughput bottleneck issue. 此活动通常也是协作性的,旨在寻找减少或消除在第一项活动中识别的风险的方法。可能会出现需要根据风险识别完全更改原始架构的情况,而其他情况可能只是简单的架构重构,例如添加一个队列以应对背压,从而减少吞吐量瓶颈问题。
Regardless of the changes required in the architecture, this activity usually incurs additional cost. For that reason, key stakeholders typically decide whether the cost outweighs the risk. For example, suppose that through a risk storming session the central database was identified as being medium risk (4) with regard to overall system availability. In this case, the participants agreed that clustering the database, com- 无论架构中需要哪些更改,这项活动通常会产生额外的成本。因此,关键利益相关者通常会决定成本是否超过风险。例如,假设通过风险风暴会议,中央数据库被确定为在整体系统可用性方面的中等风险(4)。在这种情况下,参与者一致同意对数据库进行集群。
bined with breaking the single database into separate physical databases, would mitigate that risk. However, while risk would be significantly reduced, this solution would cost $20,000\$ 20,000. The architect would then conduct a meeting with the key business stakeholder to discuss this trade-off. During this negotiation, the business owner decides that the price tag is too high and that the cost does not outweigh the risk. Rather than giving up, the architect then suggests a different approach-what about skipping the clustering and splitting the database into two parts? The cost in this case is reduced to $8,000\$ 8,000 while still mitigating most of the risk. In this case, the stakeholder agrees to the solution. 将单一数据库拆分为多个物理数据库的结合将降低该风险。然而,尽管风险会显著降低,但此解决方案的成本为 $20,000\$ 20,000 。然后,架构师将与关键业务利益相关者召开会议,讨论这一权衡。在这次谈判中,业务负责人决定价格过高,成本并不值得冒这个风险。架构师并没有放弃,而是建议了一种不同的方法——跳过集群,将数据库拆分为两个部分?在这种情况下,成本降低到 $8,000\$ 8,000 ,同时仍然降低了大部分风险。在这种情况下,利益相关者同意了该解决方案。
The previous scenario shows the impact risk storming can have not only on the overall architecture, but also with regard to negotiations between architects and business stakeholders. Risk storming, combined with the risk assessments described at the start of this chapter, provide an excellent vehicle for identifying and tracking risk, improving the architecture, and handling negotiations between key stakeholders. 前面的场景展示了风险风暴不仅对整体架构的影响,还涉及到架构师与业务利益相关者之间的谈判。风险风暴结合本章开头描述的风险评估,为识别和跟踪风险、改善架构以及处理关键利益相关者之间的谈判提供了一个极好的工具。
Agile Story Risk Analysis 敏捷故事风险分析
Risk storming can be used for other aspects of software development besides just architecture. For example, we’ve leveraged risk storming for determining overall risk of user story completion within a given Agile iteration (and consequently the overall risk assessment of that iteration) during story grooming. Using the risk matrix, user story risk can be identified by the first dimension (the overall impact if the story is not completed within the iteration) and the second dimension (the likelihood that the story will not be completed). By utilizing the same architecture risk matrix for stories, teams can identify stories of high risk, track those carefully, and prioritize them. 风险风暴不仅可以用于架构,还可以用于软件开发的其他方面。例如,我们在故事梳理过程中利用风险风暴来确定在给定的敏捷迭代中用户故事完成的整体风险(因此也评估该迭代的整体风险)。使用风险矩阵,可以通过第一维(如果故事在迭代中未完成的整体影响)和第二维(故事未完成的可能性)来识别用户故事的风险。通过对故事使用相同的架构风险矩阵,团队可以识别高风险故事,仔细跟踪这些故事,并优先处理它们。
Risk Storming Examples 风险风暴示例
To illustrate the power of risk storming and how it can improve the overall architecture of a system, consider the example of a call center system to support nurses advising patients on various health conditions. The requirements for such a system are as follows: 为了说明风险风暴的力量以及它如何改善系统的整体架构,考虑一个呼叫中心系统的例子,该系统支持护士就各种健康状况向患者提供建议。该系统的需求如下:
The system will use a third-party diagnostics engine that serves up questions and guides the nurses or patients regarding their medical issues. 该系统将使用一个第三方诊断引擎,提供问题并指导护士或患者解决他们的医疗问题。
Patients can either call in using the call center to speak to a nurse or choose to use a self-service website that accesses the diagnostic engine directly, bypassing the nurses. 患者可以通过呼叫中心拨打电话与护士交谈,或者选择使用自助服务网站直接访问诊断引擎,绕过护士。
The system must support 250 concurrent nurses nationwide and up to hundreds of thousands of concurrent self-service patients nationwide. 该系统必须支持全国范围内 250 名并发护士以及数十万名并发自助患者。
Nurses can access patients’ medical records through a medical records exchange, but patients cannot access their own medical records. 护士可以通过医疗记录交换访问患者的医疗记录,但患者无法访问自己的医疗记录。
The system must be HIPAA compliant with regard to the medical records. This means that it is essential that no one but nurses have access to medical records. 该系统必须符合 HIPAA 关于医疗记录的要求。这意味着只有护士可以访问医疗记录。
Outbreaks and high volume during cold and flu season need to be addressed in the system. 在感冒和流感季节,疫情和高发量需要在系统中得到处理。
Call routing to nurses is based on the nurse’s profile (such as bilingual needs). 呼叫路由到护士是基于护士的个人资料(例如双语需求)。
The third-party diagnostic engine can handle about 500 requests a second. 第三方诊断引擎可以处理大约每秒 500 个请求。
The architect of the system created the high-level architecture illustrated in Figure 20-9. In this architecture there are three separate web-based user interfaces: one for self-service, one for nurses receiving calls, and one for administrative staff to add and maintain the nursing profile and configuration settings. The call center portion of the system consists of a call accepter which receives calls and the call router which routes calls to the next available nurse based on their profile (notice how the call router accesses the central database to get nurse profile information). Central to this architecture is a diagnostics system API gateway, which performs security checks and directs the request to the appropriate backend service. 系统的架构师创建了图 20-9 所示的高层架构。在这个架构中,有三个独立的基于网络的用户界面:一个用于自助服务,一个用于接听电话的护士,另一个用于行政人员添加和维护护理档案及配置设置。系统的呼叫中心部分由一个接听电话的呼叫接收器和一个呼叫路由器组成,呼叫路由器根据护士的档案将电话路由到下一个可用的护士(注意呼叫路由器如何访问中央数据库以获取护士档案信息)。这个架构的核心是一个诊断系统 API 网关,它执行安全检查并将请求指向适当的后端服务。
Figure 20-9. High-level architecture for nurse diagnostics system example 图 20-9. 护士诊断系统示例的高层架构
There are four main services in this system: a case management service, a nurse profile management service, an interface to the medical records exchange, and the external third-party diagnostics engine. All communications are using REST with the exception of proprietary protocols to the external systems and call center services. 该系统有四个主要服务:案件管理服务、护士档案管理服务、医疗记录交换接口和外部第三方诊断引擎。所有通信都使用 REST,外部系统和呼叫中心服务除外,使用专有协议。
The architect has reviewed this architecture numerous times and believes it is ready for implementation. As a self-assessment, study the requirements and the architecture diagram in Figure 20-9 and try to determine the level of risk within this architecture in terms of availability, elasticity, and security. After determining the level of risk, then determine what changes would be needed in the architecture to mitigate that risk. The sections that follow contain scenarios that can be used as a comparison. 架构师已经多次审查了这个架构,并相信它已准备好实施。作为自我评估,请研究需求和图 20-9 中的架构图,并尝试确定该架构在可用性、弹性和安全性方面的风险水平。在确定风险水平后,接着确定为了降低该风险需要对架构进行哪些更改。接下来的部分包含可以用作比较的场景。
Availability 可用性
During the first risk storming exercise, the architect chose to focus on availability first since system availability is critical for the success of this system. After the risk storming identification and collaboration activities, the participants came up with the following risk areas using the risk matrix (as illustrated in Figure 20-10): 在第一次风险风暴演练中,架构师选择首先关注可用性,因为系统可用性对该系统的成功至关重要。在风险风暴识别和协作活动之后,参与者使用风险矩阵(如图 20-10 所示)提出了以下风险领域:
The use of a central database was identified as high risk (6) due to high impact (3) and medium likelihood (2). 由于高影响(3)和中等可能性(2),使用中央数据库被确定为高风险(6)。
The diagnostics engine availability was identified as high risk (9) due to high impact (3) and unknown likelihood (3). 诊断引擎的可用性被识别为高风险(9),由于高影响(3)和未知可能性(3)。
The medical records exchange availability was identified as low risk (2) since it is not a required component for the system to run. 医疗记录交换的可用性被确定为低风险(2),因为它不是系统运行所需的组件。
Other parts of the system were not deemed as risk for availability due to multiple instances of each service and clustering of the API gateway. 系统的其他部分由于每个服务的多个实例和 API 网关的集群,未被视为可用性风险。
Figure 20-10. Availability risk areas 图 20-10. 可用性风险领域
During the risk storming effort, all participants agreed that while nurses can manually write down case notes if the database went down, the call router could not func- 在风险风暴会议中,所有参与者一致认为,虽然护士可以在数据库宕机时手动记录案例笔记,但呼叫路由器无法正常工作
tion if the database were not available. To mitigate the database risk, participants chose to break apart the single physical database into two separate databases: one clustered database containing the nurse profile information, and one single instance database for the case notes. Not only did this architecture change address the concerns about availability of the database, but it also helped secure the case notes from admin access. Another option to mitigate this risk would have been to cache the nurse profile information in the call router. However, because the implementation of the call router was unknown and may be a third-party product, the participants went with the database approach. 如果数据库不可用,系统将无法进行操作。为了降低数据库风险,参与者选择将单一的物理数据库拆分为两个独立的数据库:一个集群数据库用于存储护士档案信息,另一个单实例数据库用于存储案例记录。这种架构变化不仅解决了数据库可用性的问题,还帮助保护了案例记录不被管理员访问。另一种降低此风险的选择是将护士档案信息缓存到呼叫路由器中。然而,由于呼叫路由器的实现未知,可能是第三方产品,参与者选择了数据库的方法。
Mitigating the risk of availability of the external systems (diagnostics engine and medical records exchange) is much harder to manage due to the lack of control of these systems. One way to mitigate this sort of availability risk is to research if there is a published service-level agreement (SLA) or service-level objective (SLO) for each of these systems. An SLA is usually a contractual agreement and is legally binding, whereas an SLO is usually not. Based on research, the architect found that the SLA for the diagnostics engine is guaranteed to be 99.99%99.99 \% available (that’s 52.60 minutes of downtime per year), and the medical records exchange is guaranteed at 99.9%99.9 \% availability (that’s 8.77 hours of downtime per year). Based on the relative risk, this information was enough to remove the identified risk. 由于缺乏对外部系统(诊断引擎和医疗记录交换)的控制,降低这些系统可用性的风险管理要困难得多。缓解这种可用性风险的一种方法是研究是否有针对这些系统的已发布服务水平协议(SLA)或服务水平目标(SLO)。SLA 通常是具有法律约束力的合同协议,而 SLO 通常不是。根据研究,架构师发现诊断引擎的 SLA 保证可用性为 99.99%99.99 \% (每年停机时间为 52.60 分钟),医疗记录交换的可用性保证为 99.9%99.9 \% (每年停机时间为 8.77 小时)。根据相对风险,这些信息足以消除识别出的风险。
The corresponding changes to the architecture after this risk storming session are illustrated in Figure 20-11. Notice that two databases are now used, and also the SLAs are published on the architecture diagram. 在这次风险风暴会议后,对架构的相应更改如图 20-11 所示。请注意,现在使用了两个数据库,并且服务水平协议(SLA)也在架构图上发布。
On the second risk storming exercise, the architect chose to focus on elasticityspikes in user load (otherwise known as variable scalability). Although there are only 250 nurses (which provides an automatic governor for most of the services), the selfservice portion of the system can access the diagnostics engine as well as nurses, significantly increasing the number of requests to the diagnostics interface. Participants were concerned about outbreaks and flu season, when anticipated load on the system would significantly increase. 在第二次风险风暴演练中,架构师选择专注于用户负载的弹性峰值(也称为可变可扩展性)。尽管只有 250 名护士(这为大多数服务提供了自动调节器),但系统的自助服务部分可以访问诊断引擎以及护士,显著增加了对诊断接口的请求数量。参与者担心疫情和流感季节,当时系统的预期负载将显著增加。
During the risk storming session, the participants all identified the diagnostics engine interface as high risk (9). With only 500 requests per second, the participants calculated that there was no way the diagnostics engine interface could keep up with the anticipated throughput, particularly with the current architecture utilizing REST as the interface protocol. 在风险风暴会议中,参与者们都将诊断引擎接口识别为高风险(9)。在每秒仅有 500 个请求的情况下,参与者们计算出诊断引擎接口无法跟上预期的吞吐量,特别是在当前架构使用 REST 作为接口协议的情况下。
One way to mitigate this risk is to leverage asynchronous queues (messaging) between the API gateway and the diagnostics engine interface to provide a backpressure point if calls to the diagnostics engine get backed up. While this is a good practice, it still doesn’t mitigate the risk, because nurses (as well as self-service patients) would be waiting too long for responses from the diagnostics engine, and those requests would likely time out. Leveraging what is known as the Ambulance Pattern would give nurses a higher priority over self-service. Therefore two message channels would be needed. While this technique helps mitigate the risk, it still doesn’t address the wait times. The participants decided that in addition to the queuing technique to provide back-pressure, caching the particular diagnostics questions related to an outbreak would remove outbreak and flu calls from ever having to reach the diagnostics engine interface. 一种减轻这种风险的方法是利用 API 网关和诊断引擎接口之间的异步队列(消息传递),以提供一个背压点,以防对诊断引擎的调用被阻塞。虽然这是一个好的做法,但它仍然无法减轻风险,因为护士(以及自助服务的患者)将会等待诊断引擎的响应太久,而这些请求很可能会超时。利用被称为“救护车模式”的方法将使护士的优先级高于自助服务。因此,需要两个消息通道。虽然这种技术有助于减轻风险,但它仍然没有解决等待时间的问题。参与者决定,除了排队技术提供背压外,缓存与疫情相关的特定诊断问题将使疫情和流感的呼叫不必到达诊断引擎接口。
The corresponding architecture changes are illustrated in Figure 20-12. Notice that in addition to two queue channels (one for the nurses and one for self-service patients), there is a new service called the Diagnostics Outbreak Cache Server that handles all requests related to a particular outbreak or flu-related question. With this architecture in place, the limiting factor was removed (calls to the diagnostics engine), allowing for tens of thousands of concurrent requests. Without a risk storming effort, this risk might not have been identified until an outbreak or flu season happened. 相应的架构变化在图 20-12 中进行了说明。请注意,除了两个队列通道(一个用于护士,一个用于自助患者)之外,还有一个新的服务叫做 Diagnostics Outbreak Cache Server,处理与特定疫情或流感相关的问题的所有请求。通过这种架构,限制因素被消除(对诊断引擎的调用),允许数以万计的并发请求。如果没有风险风暴会议,这个风险可能在疫情或流感季节发生之前不会被识别。
Encouraged by the results and success of the first two risk storming efforts, the architect decides to hold a final risk storming session on another important architecture characteristic that must be supported in the system to ensure its success-security. Due to HIPAA regulatory requirements, access to medical records via the medical record exchange interface must be secure, allowing only nurses to access medical records if needed. The architect believes this is not a problem due to security checks in the API gateway (authentication and authorization) but is curious whether the participants find any other elements of security risk. 受到前两次风险风暴活动的结果和成功的鼓舞,架构师决定就系统中必须支持的另一个重要架构特性——安全性,举行最后一次风险风暴会议。由于 HIPAA 法规要求,通过医疗记录交换接口访问医疗记录必须是安全的,仅允许护士在需要时访问医疗记录。架构师认为,由于 API 网关中的安全检查(身份验证和授权),这不是问题,但他好奇参与者是否发现其他安全风险的要素。
During the risk storming, the participants all identified the Diagnostics System API gateway as a high security risk (6). The rationale for this high rating was the high impact of admin staff or self-service patients accessing medical records (3) combined with medium likelihood (2). Likelihood of risk occurring was not rated high because of the security checks for each API call, but still rated medium because all calls (selfservice, admin, and nurses) are going through the same API gateway. The architect, who only rated the risk as low (2), was convinced during the risk storming consensus activity that the risk was in fact high and needed mitigation. 在风险风暴会议中,参与者一致认为诊断系统 API 网关是一个高安全风险(6)。这一高评级的理由是管理员或自助患者访问医疗记录的高影响(3)与中等可能性(2)相结合。风险发生的可能性没有被评为高,因为每个 API 调用都有安全检查,但仍然被评为中等,因为所有调用(自助、管理员和护士)都通过同一个 API 网关。建筑师仅将风险评为低(2),但在风险风暴共识活动中被说服认为风险实际上是高的,并需要减轻。
The participants all agreed that having separate API gateways for each type of user (admin, self-service/diagnostics, and nurses) would prevent calls from either the admin web user interface or the self-service web user interface from ever reaching the medical records exchange interface. The architect agreed, creating the final architecture, as illustrated in Figure 20-13. 所有参与者一致认为,为每种用户类型(管理员、自助服务/诊断和护士)设置单独的 API 网关将防止来自管理员网页用户界面或自助服务网页用户界面的调用到达医疗记录交换接口。架构师同意了,创建了最终架构,如图 20-13 所示。
Figure 20-13. Final architecture modifications to address security risk 图 20-13. 最终架构修改以应对安全风险
The prior scenario illustrates the power of risk storming. By collaborating with other architects, developers, and key stakeholders on dimensions of risk that are vital to the success of the system, risk areas are identified that would otherwise have gone unnoticed. Compare figures Figure 20-9 and Figure 20-13 and notice the significant difference in the architecture prior to risk storming and then after risk storming. Those significant changes address availability concerns, elasticity concerns, and security concerns within the architecture. 之前的场景展示了风险风暴的力量。通过与其他架构师、开发人员和关键利益相关者合作,关注对系统成功至关重要的风险维度,识别出那些本来会被忽视的风险领域。比较图 20-9 和图 20-13,注意风险风暴前后的架构之间的显著差异。这些显著的变化解决了架构中的可用性问题、弹性问题和安全问题。
Risk storming is not a one-time process. Rather, it is a continuous process through the life of any system to catch and mitigate risk areas before they happen in production. How often the risk storming effort happens depends on many factors, including frequency of change, architecture refactoring efforts, and the incremental development of the architecture. It is typical to undergo a risk storming effort on some particular dimension after a major feature is added or at the end of every iteration. 风险风暴并不是一次性的过程。相反,它是一个持续的过程,贯穿任何系统的生命周期,以便在生产中发生之前捕捉和减轻风险区域。风险风暴的频率取决于许多因素,包括变更频率、架构重构工作以及架构的增量开发。在添加主要功能后或在每次迭代结束时,通常会在某个特定维度上进行风险风暴工作。
CHAPTER 21 第 21 章
Diagramming and Presenting Architecture 架构图示和展示
Newly minted architects often comment on how surprised they are at how varied the job is outside of technical knowledge and experience, which enabled their move into the architect role to begin with. In particular, effective communication becomes critical to an architect’s success. No matter how brilliant an architect’s technical ideas, if they can’t convince managers to fund them and developers to build them, their brilliance will never manifest. 新晋架构师常常评论他们对工作在技术知识和经验之外的多样性感到惊讶,这些知识和经验使他们能够开始转向架构师角色。特别是,有效的沟通对架构师的成功至关重要。无论架构师的技术想法多么出色,如果他们不能说服管理者为其提供资金并说服开发者去实现这些想法,他们的才华将永远无法显现。
Diagramming and presenting architectures are two critical soft skills for architects. While entire books exist about each topic, we’ll hit some particular highlights for each. 图示和展示架构是架构师的两个关键软技能。虽然关于每个主题都有整本书,但我们将重点介绍每个主题的一些特别亮点。
These two topics appear together because they have a few similar characteristics: each forms an important visual representation of an architecture vision, presented using different media. However, representational consistency is a concept that ties both together. 这两个主题之所以一起出现,是因为它们有一些相似的特征:每个主题都形成了架构愿景的重要视觉表现,使用不同的媒介呈现。然而,表现一致性是将两者联系在一起的一个概念。
When visually describing an architecture, the creator often must show different views of the architecture. For example, the architect will likely show an overview of the entire architecture topology, then drill into individual parts to delve into design details. However, if the architect shows a portion without indicating where it lies within the overall architecture, it confuses viewers. Representational consistency is the practice of always showing the relationship between parts of an architecture, either in diagrams or presentations, before changing views. 在视觉描述架构时,创建者通常必须展示架构的不同视图。例如,架构师可能会展示整个架构拓扑的概述,然后深入到各个部分以探讨设计细节。然而,如果架构师展示某个部分而没有指明它在整体架构中的位置,这会让观众感到困惑。表现一致性是指在改变视图之前,总是展示架构各部分之间的关系,无论是在图表还是演示中。
For example, if an architect wanted to describe the details of how the plug-ins relate to one another in the Silicon Sandwiches solution, the architecture would show the entire topology, then drill into the plug-in structure, showing the viewers the relationship between them; an example of this appears in Figure 21-1. 例如,如果架构师想要描述在硅三明治解决方案中插件之间的关系细节,架构将展示整个拓扑结构,然后深入到插件结构,向观众展示它们之间的关系;这一示例出现在图 21-1 中。
Figure 21-1. Using representational consistency to indicate context in a larger diagram 图 21-1. 使用表现一致性在更大图表中指示上下文
Careful use of representational consistency ensures that viewers understand the scope of items being presented, eliminating a common source of confusion. 谨慎使用表现一致性确保观众理解所呈现项目的范围,从而消除常见的混淆来源。
Diagramming 图示化
The topology of architecture is always of interest to architects and developers because it captures how the structure fits together and forms a valuable shared understanding across the team. Therefore, architects should hone their diagramming skills to razor sharpness. 架构的拓扑结构始终引起架构师和开发人员的兴趣,因为它捕捉了结构如何组合在一起,并在团队中形成了宝贵的共同理解。因此,架构师应该将他们的图示技能磨练得锋利无比。
Tools 工具
The current generation of diagramming tools for architects is extremely powerful, and an architect should learn their tool of choice deeply. However, before going to a nice tool, don’t neglect low-fidelity artifacts, especially early in the design process. Building very ephemeral design artifacts early prevents architects from becoming overly attached to what they have created, an anti-pattern we named the Irrational Artifact Attachment anti-pattern. 当前一代建筑师的图表工具非常强大,建筑师应该深入学习他们选择的工具。然而,在使用优秀工具之前,不要忽视低保真度的工件,特别是在设计过程的早期。早期构建非常短暂的设计工件可以防止建筑师对他们所创造的东西过于依恋,这是一种我们称之为非理性工件依附的反模式。
Irrational Artifact Attachment 非理性工件依附
…is the proportional relationship between a person’s irrational attachment to some artifact and how long it took to produce. If an architect creates a beautiful diagram using some tool like Visio that takes two hours, they have an irrational attachment to that artifact that’s roughly proportional to the amount of time invested, which also means they will be more attached to a four-hour diagram than a two-hour one. …是一个人对某个工件的非理性依恋与生产该工件所花费时间之间的比例关系。如果一个架构师使用像 Visio 这样的工具创建了一个美丽的图表,花费了两个小时,他们对该工件的非理性依恋大致与投入的时间成正比,这也意味着他们会对一个花费四个小时的图表比对一个花费两个小时的图表更有依恋。
One of the benefits to the low-ritual approach used in Agile software development revolves around creating just-in-time artifacts, with as little ceremony as possible (this helps explain the dedication of lots of agilists to index cards and sticky notes). Using low-tech tools lets team members throw away what’s not right, freeing them to experiment and allow the true nature of the artifact emerge through revision, collaboration, and discussion. 在敏捷软件开发中,低仪式方法的一个好处是能够创建及时的工件,尽可能少地进行仪式(这有助于解释许多敏捷开发者对索引卡和便签的热衷)。使用低技术工具使团队成员能够丢弃不合适的东西,从而让他们自由地进行实验,并通过修订、协作和讨论让工件的真实性质显现出来。
An architect’s favorite variation on the cell phone photo of a whiteboard (along with the inevitable “Do Not Erase!” imperative) uses a tablet attached to an overhead projector rather than a whiteboard. This offers several advantages. First, the tablet has an unlimited canvas and can fit as many drawings that a team might need. Second, it allows copy/paste “what if” scenarios that obscure the original when done on a whiteboard. Third, images captured on a tablet are already digitized and don’t have the inevitable glare associated with cell phone photos of whiteboards. 建筑师最喜欢的变体是用平板电脑连接到投影仪,而不是白板,拍摄白板的手机照片(以及不可避免的“请勿擦除!”的命令)。这提供了几个优势。首先,平板电脑拥有无限的画布,可以容纳团队可能需要的任意数量的图纸。其次,它允许复制/粘贴“如果”场景,这在白板上完成时会遮蔽原始内容。第三,平板电脑上捕获的图像已经数字化,并且没有与白板手机照片相关的不可避免的眩光。
Eventually, an architect needs to create nice diagrams in a fancy tool, but make sure the team has iterated on the design sufficiently to invest time in capturing something. 最终,架构师需要在一个华丽的工具中创建漂亮的图表,但要确保团队在设计上进行了足够的迭代,以便投入时间去捕捉某些内容。
Powerful tools exist to create diagrams on every platform. While we don’t necessarily advocate one over another (we quite happily used OmniGraffle for all the diagrams in this book), architects should look for at least this baseline of features: 强大的工具可以在每个平台上创建图表。虽然我们并不一定主张某一个工具优于另一个(我们在本书中非常乐意使用 OmniGraffle 制作所有图表),但架构师应该至少寻找以下基本功能:
Layers 层次
Drawing tools often support layers, which architects should learn well. A layer allows the drawer to link a group of items together logically to enable hiding/ showing individual layers. Using layers, an architect can build a comprehensive diagram but hide overwhelming details when they aren’t necessary. Using layers also allows architects to incrementally build pictures for presentations later (see “Incremental Builds” on page 322). 绘图工具通常支持图层,建筑师应该很好地学习这一点。图层允许绘图者将一组项目在逻辑上链接在一起,以便能够隐藏/显示单独的图层。使用图层,建筑师可以构建一个全面的图表,但在不必要时隐藏过多的细节。使用图层还允许建筑师逐步构建后续演示的图像(见第 322 页的“增量构建”)。
Stencils/templates 模板/模版
Stencils allow an architect to build up a library of common visual components, often composites of other basic shapes. For example, throughout this book, readers have seen standard pictures of things like microservices, which exist as a single item in the authors’ stencil. Building a stencil for common patterns and 模板允许架构师建立一个常见视觉组件的库,通常是其他基本形状的组合。例如,在本书中,读者已经看到了像微服务这样的标准图示,它们作为作者模板中的单个项目存在。为常见模式构建模板和
artifacts within an organization creates consistency within architecture diagrams and allows the architect to build new diagrams quickly. 在组织内的工件创建了架构图的一致性,并允许架构师快速构建新的图表。
Magnets 磁铁
Many drawing tools offer assistance when drawing lines between shapes. Magnets represent the places on those shapes where lines snap to connect automatically, providing automatic alignment and other visual niceties. Some tools allow the architect to add more magnets or create their own to customize how the connections look within their diagrams. 许多绘图工具在绘制形状之间的线条时提供帮助。磁铁表示这些形状上线条自动连接的地方,提供自动对齐和其他视觉效果。一些工具允许架构师添加更多磁铁或创建自己的磁铁,以自定义连接在图表中的外观。
In addition to these specific helpful features, the tool should, of course, support lines, colors, and other visual artifacts, as well as the ability to export in a wide variety of formats. 除了这些特定的有用功能外,该工具当然还应支持线条、颜色和其他视觉元素,以及以多种格式导出文件的能力。
Diagramming Standards: UML, C4, and ArchiMate 图示标准:UML、C4 和 ArchiMate
Several formal standards exist for technical diagrams in software. 在软件中,存在几种正式的技术图表标准。
UML
Unified Modeling Language (UML) was a standard that unified three competing design philosophies that coexisted in the 1980s. It was supposed to be the best of all worlds but, like many things designed by committee, failed to create much impact outside organizations that mandated its use. 统一建模语言(UML)是一个标准,它统一了 1980 年代共存的三种竞争设计理念。它本应是各方面的最佳选择,但像许多由委员会设计的事物一样,未能在强制使用它的组织之外产生太大影响。
Architects and developers still use UML class and sequence diagrams to communicate structure and workflow, but most of the other UML diagram types have fallen into disuse. 架构师和开发人员仍然使用 UML 类图和时序图来传达结构和工作流程,但大多数其他 UML 图类型已经不再使用。
(4
C 4 is a diagramming technique developed by Simon Brown to address deficiencies in UML and modernize its approach. The four C’s in C4 are as follows: C 4 是由 Simon Brown 开发的一种图示技术,旨在解决 UML 的不足并使其方法现代化。C4 中的四个 C 如下:
Context 上下文
Represents the entire context of the system, including the roles of users and external dependencies. 表示系统的整个上下文,包括用户的角色和外部依赖。
Container 容器
The physical (and often logical) deployment boundaries and containers within the architecture. This view forms a good meeting point for operations and architects. 架构中的物理(通常也是逻辑)部署边界和容器。这个视图为运维和架构师提供了一个良好的交汇点。
Component 组件
The component view of the system; this most neatly aligns with an architect’s view of the system. 系统的组件视图;这与架构师对系统的看法最为一致。
Class 类
C4 uses the same style of class diagrams from UML, which are effective, so there is no need to replace them. C4 使用与 UML 相同风格的类图,这些类图是有效的,因此没有必要替换它们。
If a company seeks to standardize on a diagramming technique, C 4 offers a good alternative. However, like all technical diagramming tools, it suffers from an inability to express every kind of design an architecture might undertake. C4 is best suited for monolithic architectures where the container and component relationships may differ, and it’s less suited to distributed architectures, such as microservices. 如果一家公司希望在图示技术上实现标准化,C 4 提供了一个不错的替代方案。然而,像所有技术图示工具一样,它无法表达架构可能进行的每种设计。C4 最适合单体架构,其中容器和组件关系可能不同,而不太适合分布式架构,例如微服务。
ArchiMate
ArchiMate (an amalgam of Architecture-Animate) is an open source enterprise architecture modeling language to support the description, analysis, and visualization of architecture within and across business domains. ArchiMate is a technical standard from The Open Group, and it offers a lighter-weight modeling language for enterprise ecosystems. The goal of ArchiMate is to be “as small as possible,” not to cover every edge case scenario. As such, it has become a popular choice among many architects. ArchiMate(Architecture-Animate 的合成词)是一种开源企业架构建模语言,用于支持在业务领域内及跨业务领域的架构描述、分析和可视化。ArchiMate 是 The Open Group 的技术标准,它为企业生态系统提供了一种更轻量级的建模语言。ArchiMate 的目标是“尽可能小”,而不是覆盖每一个边缘案例。因此,它已成为许多架构师的热门选择。
Diagram Guidelines 图表指南
Regardless of whether an architect uses their own modeling language or one of the formal ones, they should build their own style when creating diagrams and should feel free to borrow from representations they think are particularly effective. Here are some general guidelines to use when creating technical diagrams. 无论架构师使用自己的建模语言还是某种正式语言,他们在创建图表时都应该建立自己的风格,并且可以自由借鉴他们认为特别有效的表现形式。以下是创建技术图表时的一些一般性指导原则。
Titles 标题
Make sure all the elements of the diagram have titles or are well known to the audience. Use rotation and other effects to make titles “sticky” to the thing they associate with and to make efficient use of space. 确保图表中的所有元素都有标题或为观众所熟知。使用旋转和其他效果使标题与其关联的事物“粘性”并有效利用空间。
Lines 行
Lines should be thick enough to see well. If lines indicate information flow, then use arrows to indicate directional or two-way traffic. Different types of arrowheads might suggest different semantics, but architects should be consistent. 线条应该足够粗,以便清晰可见。如果线条表示信息流,则使用箭头指示单向或双向流动。不同类型的箭头可能暗示不同的语义,但架构师应该保持一致。
Generally, one of the few standards that exists in architecture diagrams is that solid lines tend to indicate synchronous communication and dotted lines indicate asynchronous communication. 通常,架构图中存在的少数标准之一是实线通常表示同步通信,而虚线表示异步通信。
Shapes 形状
While the formal modeling languages described all have standard shapes, no pervasive standard shapes exist across the software development world. Thus, each architect tends to make their own standard set of shapes, sometimes spreading those across an organization to create a standard language. 虽然所描述的正式建模语言都有标准形状,但在软件开发领域并不存在普遍的标准形状。因此,每个架构师往往会创建自己的一套标准形状,有时会在组织内部传播这些形状以创建标准语言。
We tend to use three-dimensional boxes to indicate deployable artifacts and rectangles to indicate containership, but we don’t have any particular key beyond that. 我们倾向于使用三维盒子来表示可部署的工件,使用矩形来表示容器,但我们没有其他特别的标记。
Labels 标签
Architects should label each item in a diagram, especially if there is any chance of ambiguity for the readers. 架构师应该在图表中标记每个项目,特别是如果读者可能会产生任何歧义的情况下。
Color 颜色
Architects often don’t use color enough-for many years, books were out of necessity printed in black and white, so architects and developers became accustomed to monochrome drawings. While we still favor monochrome, we use color when it helps distinguish one artifact from another. For example, when discussing microservices communication strategies in “Communication” on page 254, we used color to indicate that two difference microservices participate in the coordination, not two instances of the same service, as reproduced in Figure 21-2. 建筑师往往不够使用颜色——多年来,书籍出于必要性以黑白印刷,因此建筑师和开发人员习惯了单色图纸。虽然我们仍然偏爱单色,但在有助于区分一个工件与另一个工件时,我们会使用颜色。例如,在第 254 页的“通信”中讨论微服务通信策略时,我们使用颜色来表示两个不同的微服务参与协调,而不是同一服务的两个实例,如图 21-2 所示。
Figure 21-2. Reproduction of microservices communication example showing different services in unique colors 图 21-2. 微服务通信示例的再现,显示不同服务以独特颜色呈现
Keys 键
If shapes are ambiguous for any reason, include a key on the diagram clearly indicating what each shape represents. Nothing is worse than a diagram that leads to misinterpretation, which is worse than no diagram. 如果形状因任何原因而模糊不清,请在图表上包含一个关键,清楚地指示每个形状代表的内容。没有什么比导致误解的图表更糟糕,这比没有图表还要糟糕。
Presenting 呈现
The other soft skill required by modern architects is the ability to conduct effective presentations using tools like PowerPoint and Keynote. These tools are the lingua franca of modern organizations, and people throughout the organization expect competent use of these tools. Unfortunately, unlike word processors and spreadsheets, no one seems to spend much time studying how to use these tools well. 现代架构师所需的另一项软技能是能够使用 PowerPoint 和 Keynote 等工具进行有效的演示。这些工具是现代组织的通用语言,组织内的人员期望能够熟练使用这些工具。不幸的是,与文字处理器和电子表格不同,似乎没有人花太多时间研究如何很好地使用这些工具。
Neal, one of the coauthors of this book, wrote a book several years ago entitled Presentation Patterns (Addison-Wesley Professional), about taking the patterns/antipatterns approach common in the software world and applying it to technical presentations. Neal,这本书的合著者之一,几年前写了一本名为《Presentation Patterns》(Addison-Wesley Professional)的书,关于在软件领域常见的模式/反模式方法,并将其应用于技术演示。
Presentation Patterns makes an important observation about the fundamental difference between creating a document versus a presentation to make a case for some-thing-time. In a presentation, the presenter controls how quickly an idea is unfolding, whereas the reader of a document controls that. Thus, one of the most important skills an architect can learn in their presentation tool of choice is how to manipulate time. 演示模式对创建文档与演示之间的根本区别做出了重要观察,以支持某个时间的论点。在演示中,演讲者控制着一个想法展开的速度,而文档的读者则控制这一点。因此,建筑师在其选择的演示工具中可以学习的最重要技能之一就是如何操控时间。
Manipulating Time 操控时间
Presentation tools offer two ways to manipulate time on slides: transitions and animations. Transitions move from slide to slide, and animations allow the designer to create movement within a slide. Typically, presentation tools allow just one transition per slide but a host of animations for each element: build in (appearance), build out (disappearance), and actions (such as movement, scale, and other dynamic behavior). 演示工具提供了两种在幻灯片上操控时间的方法:过渡和动画。过渡用于从一张幻灯片切换到另一张幻灯片,而动画则允许设计师在幻灯片内创建运动。通常,演示工具每张幻灯片只允许一个过渡,但每个元素可以有多种动画:出现(build in)、消失(build out)和动作(如移动、缩放和其他动态行为)。
While tools offer a variety of splashy effects like dropping anvils, architects use transition and animations to hide the boundaries between slides. One common antipattern called out in Presentation Patterns named Cookie-Cutter states that ideas don’t have a predetermined word count, and accordingly, designers shouldn’t artificially pad content to make it appear to fill a slide. Similarly, many ideas are bigger than a single slide. Using subtle combinations of transitions and animations such as dissolve allows presenters to hide individual slide boundaries, stitching together a set of slides to tell a single story. To indicate the end of a thought, presenters should use a distinctly different transition (such as door or cube) to provide a visual clue that they are moving to a different topic. 虽然工具提供了多种华丽的效果,比如掉落铁砧,但架构师使用过渡和动画来隐藏幻灯片之间的边界。在《演示模式》中提到的一个常见反模式叫做“模具”,它指出想法没有预定的字数,因此设计师不应该人为地填充内容以使其看起来填满一张幻灯片。同样,许多想法比单一幻灯片更大。使用溶解等微妙的过渡和动画组合可以让演讲者隐藏单个幻灯片的边界,将一组幻灯片拼接在一起讲述一个完整的故事。为了表示一个想法的结束,演讲者应该使用明显不同的过渡(如门或立方体)来提供视觉线索,表明他们正在转向一个不同的话题。
Incremental Builds 增量构建
The Presentation Patterns book calls out the Bullet-Riddled Corpse as a common antipattern of corporate presentations, where every slide is essentially the speaker’s notes, projected for all to see. Most readers have the excruciating experience of watching a slide full of text appear during a presentation, then reading the entire thing (because no one can resist reading it all as soon as it appears), only to sit for the next 10 minutes while the presenter slowly reads the bullets to the audience. No wonder so many corporate presentations are dull! 《演示模式》一书指出,Bullet-Riddled Corpse 是企业演示的常见反模式,其中每一张幻灯片基本上都是演讲者的笔记,投影给所有人看。大多数读者都有过在演示中看到一张满是文字的幻灯片出现的痛苦经历,然后阅读整个内容(因为没有人能抵挡住一出现就想全部阅读),接着在接下来的 10 分钟里坐在那里,听演讲者慢慢地向观众朗读要点。难怪这么多企业演示都很乏味!
When presenting, the speaker has two information channels: verbal and visual. By placing too much text on the slides and then saying essentially the same words, the presenter is overloading one information channel and starving the other. The better solution to this problem is to use incremental builds for slides, building up (hopefully graphical) information as needed rather than all at once. 在演示时,演讲者有两个信息通道:语言和视觉。通过在幻灯片上放置过多的文本,然后说出基本相同的话,演讲者使一个信息通道过载,而另一个则缺乏信息。解决这个问题的更好方法是对幻灯片使用增量构建,根据需要逐步增加(希望是图形)信息,而不是一次性全部展示。
For example, say that an architect creates a presentation explaining the problems using feature branching and wants to talk about the negative consequences of keeping branches alive too long. Consider the graphical slide shown in Figure 21-3. 例如,假设一位架构师创建了一个演示,解释使用特性分支的问题,并想谈论保持分支存活时间过长的负面后果。请考虑图 21-3 中显示的图形幻灯片。
Figure 21-3. Bad version of a slide showing a negative anti-pattern 图 21-3. 显示负面反模式的幻灯片的糟糕版本
In Figure 21-3, if the presenter shows the entire slide right away, the audience can see that something bad happens toward the end, but they have to wait for the exposition to get to that point. 在图 21-3 中,如果演讲者立即展示整个幻灯片,观众会看到在最后出现了一些不好的情况,但他们必须等待阐述才能到达那个点。
Instead, the architect should use the same image but obscure parts of it when showing the slide (using a borderless white box) and expose a portion at a time (by performing a build out on the covering box), as shown in Figure 21-4. 相反,架构师应该使用相同的图像,但在展示幻灯片时遮挡其部分内容(使用无边框的白色框),并一次暴露一部分(通过对覆盖框进行展开),如图 21-4 所示。
Figure 21-4. A better, incremental version that maintains suspense 图 21-4. 一个更好的、渐进的版本,保持悬念
In Figure 21-4, the presenter still has a fighting chance of keeping some suspense alive, making the talk inherently more interesting. 在图 21-4 中,演讲者仍然有机会保持一些悬念,使演讲本质上更有趣。
Using animations and transitions in conjunction with incremental builds allows the presenter to make more compelling, entertaining presentations. 使用动画和过渡效果结合增量构建,可以让演示者制作更引人入胜、娱乐性更强的演示。
Infodecks Versus Presentations 信息甲板与演示文稿
Some architects build slide decks in tools like PowerPoint and Keynote but never actually present them. Rather, they are emailed around like a magazine article, and each individual reads them at their own pace. Infodecks are slide decks that are not meant to be projected but rather summarize information graphically, essentially using a presentation tool as a desktop publishing package. 一些架构师在 PowerPoint 和 Keynote 等工具中制作幻灯片,但实际上从不进行演示。相反,它们像杂志文章一样通过电子邮件发送,每个人都可以按照自己的节奏阅读。Infodecks 是不打算投影的幻灯片,而是以图形方式总结信息,基本上将演示工具用作桌面出版软件。
The difference between these two media is comprehensiveness of content and use of transitions and animations. If someone is going to flip through the deck like a magazine article, the author of the slides does not need to add any time elements. The other key difference between infodecks and presentations is the amount of material. Because infodecks are meant to be standalone, they contain all the information the creator wants to convey. When doing a presentation, the slides are (purposefully) meant to be half of the presentation, the other half being the person standing there talking! 这两种媒体之间的区别在于内容的全面性以及过渡和动画的使用。如果有人打算像翻阅杂志文章一样浏览幻灯片,幻灯片的作者不需要添加任何时间元素。信息图和演示文稿之间的另一个关键区别是材料的数量。因为信息图是为了独立使用,所以它们包含了创作者想要传达的所有信息。在进行演示时,幻灯片(故意)是演示的一半,另一半是站在那里讲话的人!
Slides Are Half of the Story 幻灯片只是故事的一半
A common mistake that presenters make is building the entire content of the presentation into the slides. However, if the slides are comprehensive, the presenter should spare everyone the time of sitting through a presentation and just email it to everyone as a deck! Presenters make the mistake of adding too much material to slides when they can make important points more powerfully. Remember, presenters have two information channels, so using them strategically can add more punch to the message. A great example of that is the strategic use of invisibility. 演讲者常犯的一个错误是将演示文稿的全部内容都放入幻灯片中。然而,如果幻灯片内容全面,演讲者应该节省大家坐在演讲会上的时间,直接将其作为文档通过电子邮件发送给每个人!演讲者在幻灯片中添加过多材料的错误在于,他们可以更有力地表达重要观点。请记住,演讲者有两个信息通道,因此战略性地使用它们可以增强信息的冲击力。一个很好的例子就是战略性地使用隐形。
Invisibility 隐形
Invisibility is a simple pattern where the presenter inserts a blank black slide within a presentation to refocus attention solely on the speaker. If someone has two information channels (slides and speaker) and turns one of them off (the slides), it automatically adds more emphasis to the speaker. Thus, if a presenter wants to make a point, insert a blank slide-everyone in the room will focus their attention back on the speaker because they are now the only interesting thing in the room to look at. 隐形是一种简单的模式,演讲者在演示文稿中插入一张空白黑色幻灯片,以重新集中注意力于演讲者身上。如果有人有两个信息通道(幻灯片和演讲者)并关闭其中一个(幻灯片),这会自动增加对演讲者的强调。因此,如果演讲者想要强调某个观点,可以插入一张空白幻灯片——房间里的每个人都会将注意力重新集中在演讲者身上,因为他们现在是房间里唯一值得关注的事物。
Learning the basics of a presentation tool and a few techniques to make presentations better is a great addition to the skill set of architects. If an architect has a great idea but can’t figure out a way to present it effectively, they will never get a chance to realize that vision. Architecture requires collaboration; to get collaborators, architects must convince people to sign on to their vision. The modern corporate soapboxes are presentation tools, so it’s worth learning to use them well. 学习演示工具的基础知识以及一些提高演示效果的技巧,对于建筑师的技能组合来说是一个很好的补充。如果建筑师有一个伟大的想法,但无法找到有效的方式来展示它,他们将永远没有机会实现那个愿景。架构需要协作;为了获得合作者,建筑师必须说服人们支持他们的愿景。现代企业的宣传平台就是演示工具,因此学习如何好好使用它们是值得的。
CHAPTER 22 第 22 章
Making Teams Effective 使团队高效
In addition to creating a technical architecture and making architecture decisions, a software architect is also responsible for guiding the development team through the implementation of the architecture. Software architects who do this well create effective development teams that work closely together to solve problems and create winning solutions. While this may sound obvious, too many times we’ve seen architects ignore development teams and work in siloed environments to create an architecture. This architecture then gets handed it off to a development team which then struggles to implement the architecture correctly. Being able to make teams productive is one of the ways effective and successful software architects differentiate themselves from other software architects. In this chapter we introduce some basic techniques an architect can leverage to make development teams effective. 除了创建技术架构和做出架构决策,软件架构师还负责指导开发团队实施架构。能够很好地做到这一点的软件架构师会创建有效的开发团队,这些团队紧密合作以解决问题并创造成功的解决方案。虽然这听起来很明显,但我们常常看到架构师忽视开发团队,在孤立的环境中工作以创建架构。然后,这个架构被交给开发团队,后者在正确实施架构时遇到困难。能够提高团队的生产力是有效且成功的软件架构师与其他软件架构师区分开来的方式之一。在本章中,我们介绍了一些架构师可以利用的基本技术,以使开发团队有效。
Team Boundaries 团队边界
It’s been our experience that a software architect can significantly influence the success or failure of a development team. Teams that feel left out of the loop or estranged from software architects (and also the architecture) often do not have the right level of guidance and right level of knowledge about various constraints on the system, and consequently do not implement the architecture correctly. 我们的经验是,软件架构师可以显著影响开发团队的成功或失败。感到被排除在外或与软件架构师(以及架构)疏远的团队,往往没有对系统各种约束的正确指导和知识,因此无法正确实施架构。
One of the roles of a software architect is to create and communicate the constraints, or the box, in which developers can implement the architecture. Architects can create boundaries that are too tight, too loose, or just right. These boundaries are illustrated in Figure 22-1. The impact of having too tight or too loose of a boundary has a direct impact on the teams’ ability to successfully implement the architecture. 软件架构师的一个角色是创建和传达开发人员可以实现架构的约束或框架。架构师可以创建过于严格、过于宽松或恰到好处的边界。这些边界在图 22-1 中进行了说明。边界过于严格或过于宽松会直接影响团队成功实现架构的能力。
Figure 22-1. Boundary types created by a software architect 图 22-1. 软件架构师创建的边界类型
Architects that create too many constraints form a tight box around the development teams, preventing access to many of the tools, libraries, and practices that are required to implement the system effectively. This causes frustration within the team, usually resulting in developers leaving the project for happier and healthier environments. 创建过多约束的架构师会在开发团队周围形成一个紧密的框架,阻止他们访问实现系统所需的许多工具、库和实践。这会导致团队内部的挫败感,通常导致开发人员离开项目,寻求更快乐和更健康的环境。
The opposite can also happen. A software architect can create constraints that are too loose (or no constraints at all), leaving all of the important architecture decisions to the development team. In this scenario, which is just as bad as tight constraints, the team essentially takes on the role of a software architect, performing proof of concepts and battling over design decisions without the proper level of guidance, resulting in unproductiveness, confusion, and frustration. 相反的情况也可能发生。软件架构师可能会创建过于宽松的约束(或者根本没有约束),将所有重要的架构决策留给开发团队。在这种情况下,这与严格的约束一样糟糕,团队实际上承担了软件架构师的角色,进行概念验证并在没有适当指导的情况下争论设计决策,导致低效、困惑和挫败感。
An effective software architect strives to provide the right level of guidance and constraints so that the team has the correct tools and libraries in place to effectively implement the architecture. The rest of this chapter is devoted to how to create these effective boundaries. 一个有效的软件架构师努力提供适当的指导和约束,以便团队拥有正确的工具和库来有效地实现架构。本章的其余部分将专注于如何创建这些有效的边界。
Architect Personalities 架构师个性
There are three basic types of architect personalities: a control freak architect (Figure 22-2), an armchair architect (Figure 22-3), and an effective architect (Figure 22-5). Each personality matches a particular boundary type discussed in the prior section on team boundaries: control freak architects produce tight boundaries, armchair architects produce loose boundaries, and effective architects produce just the right kinds of boundaries. 有三种基本的架构师个性:控制狂架构师(图 22-2)、沙发架构师(图 22-3)和有效架构师(图 22-5)。每种个性与前一节关于团队边界讨论的特定边界类型相匹配:控制狂架构师产生紧密的边界,沙发架构师产生松散的边界,而有效架构师产生恰到好处的边界。
Control Freak 控制狂
Figure 22-2. Control freak architect (iStockPhoto) 图 22-2. 控制狂架构师 (iStockPhoto)
The control freak architect tries to control every detailed aspect of the software development process. Every decision a control freak architect makes is usually too finegrained and too low-level, resulting in too many constraints on the development team. 控制狂架构师试图控制软件开发过程中的每一个细节。控制狂架构师所做的每一个决策通常过于细致和低级,导致对开发团队施加了过多的限制。
Control freak architects produce the tight boundaries discussed in the prior section. A control freak architect might restrict the development team from downloading any useful open source or third-party libraries and instead insist that the teams write everything from scratch using the language API. Control freak architects might also place tight restrictions on naming conventions, class design, method length, and so on. They might even go so far as to write pseudocode for the development teams. Essentially, control freak architects steal the art of programming away from the developers, resulting in frustration and a lack of respect for the architect. 控制狂架构师会产生前一节讨论的严格边界。控制狂架构师可能会限制开发团队下载任何有用的开源或第三方库,而坚持要求团队从头开始使用语言 API 编写所有内容。控制狂架构师还可能对命名约定、类设计、方法长度等施加严格限制。他们甚至可能会为开发团队编写伪代码。实质上,控制狂架构师剥夺了开发者的编程艺术,导致了挫败感和对架构师的不尊重。
It is very easy to become a control freak architect, particularly when transitioning from developer to architect. An architect’s role is to create the building blocks of the application (the components) and determine the interactions between those components. The developer’s role in this effort is to then take those components and determine how they will be implemented using class diagrams and design patterns. However, in the transition from developer to architect, it is all too tempting to want to create the class diagrams and design patterns as well since that was the newly minted architect’s prior role. 很容易成为一个控制狂架构师,特别是在从开发者转变为架构师时。架构师的角色是创建应用程序的构建块(组件)并确定这些组件之间的交互。开发者在这个过程中的角色是将这些组件实现,并使用类图和设计模式来确定它们的实现方式。然而,在从开发者转变为架构师的过程中,想要创建类图和设计模式是非常诱人的,因为这正是新任架构师之前的角色。
For example, suppose an architect creates a component (building block of the architecture) to manage reference data within the system. Reference data consists of static name-value pair data used on the website, as well as things like product codes and warehouse codes (static data used throughout the system). The architect’s role is to identify the component (in this case, Reference Manager), determine the core set of operations for that component (for example, GetData, SetData, ReloadCache, NotifyOnUpdate, and so on), and which components need to interact with the Reference Manager. The control freak architect might think that the best way to implement this component is through a parallel loader pattern leveraging an internal cache, with a particular data structure for that cache. While this might be an effective design, it’s not the only design. More importantly, it’s no longer the architect’s role to come up with this internal design for the Reference Manager-it’s the role of the developer. 例如,假设一个架构师创建一个组件(架构的构建块)来管理系统中的参考数据。参考数据由用于网站的静态名称-值对数据以及产品代码和仓库代码(在整个系统中使用的静态数据)组成。架构师的角色是识别该组件(在这种情况下是参考管理器),确定该组件的核心操作集(例如,GetData、SetData、ReloadCache、NotifyOnUpdate 等),以及哪些组件需要与参考管理器交互。控制狂架构师可能认为实现该组件的最佳方式是通过利用内部缓存的并行加载器模式,并为该缓存设计特定的数据结构。虽然这可能是一个有效的设计,但并不是唯一的设计。更重要的是,架构师不再负责为参考管理器提出这种内部设计——这是开发者的角色。
As we’ll talk about in “How Much Control?” on page 331, sometimes an architect needs to play the role of a control freak, depending on the complexity of the project and the skill level on the team. However, in most cases a control freak architect disrupts the development team, doesn’t provide the right level of guidance, gets in the way, and is ineffective at leading the team through the implementation of the architecture. 正如我们在第 331 页的“控制多少?”中所讨论的,有时架构师需要扮演控制狂的角色,这取决于项目的复杂性和团队的技能水平。然而,在大多数情况下,控制狂架构师会干扰开发团队,无法提供适当的指导,妨碍工作,并且在引导团队实施架构方面效果不佳。
Armchair Architect 扶手椅架构师
Figure 22-3. Armchair architect (iStockPhoto) 图 22-3. 扶手椅建筑师 (iStockPhoto)
The armchair architect is the type of architect who hasn’t coded in a very long time (if at all) and doesn’t take the implementation details into account when creating an architecture. They are typically disconnected from the development teams, never 臂椅架构师是指那些很久没有编码(如果有的话)并且在创建架构时不考虑实现细节的架构师。他们通常与开发团队脱节,从不
around, or simply move from project to project once the initial architecture diagrams are completed. 一旦初始架构图完成,就可以在项目之间切换,或者简单地从一个项目移动到另一个项目。
In some cases the armchair architect is simply in way over their head in terms of the technology or business domain and therefore cannot possibly lead or guide teams from a technical or business problem standpoint. For example, what do developers do? Why, they code, of course. Writing program code is really hard to fake; either a developer writes software code, or they don’t. However, what does an architect do? No one knows! Most architects draw lots of lines and boxes-but how detailed should an architect be in those diagrams? Here’s a dirty little secret about architecture-it’s really easy to fake it as an architect! 在某些情况下,坐在舒适椅子上的架构师在技术或业务领域上完全超出了他们的能力,因此无法从技术或业务问题的角度领导或指导团队。例如,开发人员做什么?当然,他们是编写代码。编写程序代码真的很难伪装;开发人员要么编写软件代码,要么不编写。但是,架构师做什么呢?没有人知道!大多数架构师画很多线条和框框——但架构师在这些图表中应该有多详细呢?关于架构有一个肮脏的小秘密——作为架构师真的很容易伪装!
Suppose an armchair architect is in way over their head or doesn’t have the time to architect an appropriate solution for a stock trading system. In that case the architecture diagram might look like the one illustrated in Figure 22-4. There’s nothing wrong with this architecture-it’s just too high level to be of any use to anyone. 假设一个业余建筑师完全超出了自己的能力范围,或者没有时间为股票交易系统设计合适的解决方案。在这种情况下,架构图可能看起来像图 22-4 所示的那样。这种架构没有任何问题——只是它的层次太高,无法对任何人有用。
Figure 22-4. Trading system architecture created by an armchair architect 图 22-4. 由沙发架构师创建的交易系统架构
Armchair architects create loose boundaries around development teams, as discussed in the prior section. In this scenario, development teams end up taking on the role of architect, essentially doing the work an architect is supposed to be doing. Team velocity and productivity suffer as a result, and teams get confused about how the system should work. 扶手椅架构师在开发团队周围创建了松散的边界,如前一部分所讨论的。在这种情况下,开发团队最终承担了架构师的角色,基本上是在做架构师应该做的工作。团队的速度和生产力因此受到影响,团队对系统应该如何工作感到困惑。
Like the control freak architect, it is all too easy to become an armchair architect. The biggest indicator that an architect might be falling into the armchair architect personality is not having enough time to spend with the development teams implementing the architecture (or choosing not to spend time with the development teams). Devel- 像控制狂建筑师一样,成为一个坐在椅子上的建筑师是非常容易的。建筑师可能陷入坐在椅子上的建筑师个性的最大迹象是没有足够的时间与实施架构的开发团队共度时光(或选择不与开发团队共度时光)。开发-
opment teams need an architect’s support and guidance, and they need the architect available for answering technical or business-related questions when they arise. Other indicators of an armchair architect are following: 开发团队需要架构师的支持和指导,并且在出现技术或业务相关问题时需要架构师随时可用来解答。其他的沙发架构师的迹象包括:
Not fully understanding the business domain, business problem, or technology used 未完全理解业务领域、业务问题或使用的技术
Not enough hands-on experience developing software 缺乏足够的软件开发实践经验
Not considering the implications associated with the implementation of the architecture solution 不考虑与架构解决方案实施相关的影响
In some cases it is not the intention of an architect to become an armchair architect, but rather it just “happens” by being spread too thin between projects or development teams and loosing touch with technology or the business domain. An architect can avoid this personality by getting more involved in the technology being used on the project and understanding the business problem and business domain. 在某些情况下,建筑师并不打算成为一个坐在椅子上的建筑师,而是因为在项目或开发团队之间分散精力过多而“发生”这种情况,从而失去对技术或业务领域的联系。建筑师可以通过更多地参与项目中使用的技术,并理解业务问题和业务领域,来避免这种情况。
An effective software architect produces the appropriate constraints and boundaries on the team, ensuring that the team members are working well together and have the right level of guidance on the team. The effective architect also ensures that the team has the correct and appropriate tools and technologies in place. In addition, they remove any roadblocks that may be in the way of the development teams reaching their goals. 一个有效的软件架构师为团队设定适当的约束和边界,确保团队成员之间良好协作,并获得适当的指导。有效的架构师还确保团队拥有正确和合适的工具和技术。此外,他们消除可能妨碍开发团队实现目标的任何障碍。
While this sounds obvious and easy, it is not. There is an art to becoming an effective leader on the development team. Becoming an effective software architect requires working closely and collaborating with the team, and gaining the respect of the team as well. We’ll be looking at other ways of becoming an effective software architect in later chapters in this part of the book. But for now, we’ll introduce some guidelines for knowing how much control an effective architect should exert on a development team. 虽然这听起来显而易见且简单,但实际上并非如此。成为开发团队中有效领导者是一门艺术。成为一名有效的软件架构师需要与团队紧密合作并进行协作,同时赢得团队的尊重。在本书的后面章节中,我们将探讨其他成为有效软件架构师的方法。但现在,我们将介绍一些指导方针,以了解有效架构师应该对开发团队施加多少控制。
How Much Control? 控制多少?
Becoming an effective software architect is knowing how much control to exert on a given development team. This concept is known as Elastic Leadership and is widely evangelized by author and consultant Roy Osherove. We’re going to deviate a bit from the work Osherove has done in this area and focus on specific factors for software architecture. 成为一名有效的软件架构师就是知道在特定开发团队上施加多少控制。这个概念被称为弹性领导,广泛受到作者和顾问 Roy Osherove 的推广。我们将稍微偏离 Osherove 在这一领域所做的工作,专注于软件架构的具体因素。
Knowing how much an effective software architect should be a control freak and how much they should be an armchair architect involves five main factors. These factors also determine how many teams (or projects) a software architect can manage at once: 了解一个有效的软件架构师应该有多么控制欲强,以及他们应该有多么像个旁观者架构师,涉及五个主要因素。这些因素还决定了一个软件架构师可以同时管理多少个团队(或项目):
Team familiarity 团队熟悉度
How well do the team members know each other? Have they worked together before on a project? Generally, the better team members know each other, the less control is needed because team members start to become self-organizing. Conversely, the newer the team members, the more control needed to help facilitate collaboration among team members and reduce cliques within the team. 团队成员彼此了解得有多好?他们之前在项目上合作过吗?一般来说,团队成员彼此了解得越好,所需的控制就越少,因为团队成员开始变得自我组织。相反,团队成员越新,所需的控制就越多,以帮助促进团队成员之间的合作并减少团队内部的小团体。
Team size 团队规模
How big is the team? (We consider more than 12 developers on the same team to be a big team, and 4 or fewer to be a small team.) The larger the team, the more control is needed. The smaller the team, less control is needed. This is discussed in more detail in “Team Warning Signs” on page 335. 团队有多大?(我们认为同一团队中超过 12 名开发人员是大团队,4 名或更少是小团队。)团队越大,需要的控制越多。团队越小,需要的控制越少。这在第 335 页的“团队警告信号”中有更详细的讨论。
Overall experience 整体体验
How many team members are senior? How many are junior? Is it a mixed team of junior and senior developers? How well do they know the technology and business domain? Teams with lots of junior developers require more control and mentoring, whereas teams with more senior developers require less control. In the latter cases, the architect moves from the role of a mentor to that of a facilitator. 团队中有多少成员是高级的?有多少是初级的?这是一个由初级和高级开发人员混合组成的团队吗?他们对技术和业务领域的了解程度如何?拥有大量初级开发人员的团队需要更多的控制和指导,而拥有更多高级开发人员的团队则需要较少的控制。在后者的情况下,架构师的角色从导师转变为促进者。
Project complexity 项目复杂性
Is the project highly complex or just a simple website? Highly complex projects require the architect to be more available to the team and to assist with issues 这个项目是高度复杂的还是只是一个简单的网站?高度复杂的项目需要架构师更频繁地与团队沟通,并协助解决问题
that arise, hence more control is needed on the team. Relatively simple projects are straightforward and hence do not require much control. 因此需要对团队进行更多的控制。相对简单的项目是直接的,因此不需要太多的控制。
Project duration 项目持续时间
Is the project short (two months), long (two years), or average duration (six months)? The shorter the duration, the less control is needed; conversely, the longer the project, the more control is needed. 项目是短期(两个月)、长期(两年)还是平均期限(六个月)?持续时间越短,所需的控制越少;相反,项目越长,所需的控制越多。
While most of the factors make sense with regard to more or less control, the project duration factor may not appear to make sense. As indicated in the prior list, the shorter the project duration, the less control is needed; the longer the project duration, the more control is needed. Intuitively this might seem reversed, but that is not the case. Consider a quick two-month project. Two months is not a lot of time to qualify requirements, experiment, develop code, test every scenario, and release into production. In this case the architect should act more as an armchair architect, as the development team already has a keen sense of urgency. A control freak architect would just get in the way and likely delay the project. Conversely, think of a project duration of two years. In this scenario the developers are relaxed, not thinking in terms of urgency, and likely planning vacations and taking long lunches. More control is needed by the architect to ensure the project moves along in a timely fashion and that complex tasks are accomplished first. 虽然大多数因素在控制程度上是合理的,但项目持续时间因素可能看起来不太合理。正如之前的列表所示,项目持续时间越短,所需的控制就越少;项目持续时间越长,所需的控制就越多。直观上这似乎是相反的,但事实并非如此。考虑一个快速的两个月项目。两个月的时间并不足以充分确定需求、进行实验、开发代码、测试每种场景并投入生产。在这种情况下,架构师应该更多地充当一个“沙发架构师”,因为开发团队已经有了强烈的紧迫感。一个控制狂架构师只会妨碍进程,并可能导致项目延迟。相反,想象一个持续两年的项目。在这种情况下,开发人员比较放松,不考虑紧迫性,可能还在计划度假和享受长时间的午餐。架构师需要更多的控制,以确保项目按时推进,并优先完成复杂任务。
It is typical within most projects that these factors are utilized to determine the level of control at the start of a project; but as the system continues to evolve, the level of control changes. Therefore, we advise that these factors continually be analyzed throughout the life cycle of a project to determine how much control to exert on the development team. 在大多数项目中,通常会利用这些因素来确定项目开始时的控制级别;但随着系统的不断演变,控制级别也会发生变化。因此,我们建议在项目的整个生命周期中持续分析这些因素,以确定对开发团队施加多少控制。
To illustrate how each of these factors can be used to determine the level of control an architect should have on a team, assume a fixed scale of 20 points for each factor. Minus values point more toward being an armchair architect (less control and involvement), whereas plus values point more toward being a control freak architect (more control and involvement). This scale is illustrated in Figure 22-6. 为了说明这些因素如何用于确定架构师在团队中应有的控制水平,假设每个因素的固定评分为 20 分。负值更倾向于成为一个坐在办公室的架构师(控制和参与较少),而正值则更倾向于成为一个控制狂架构师(控制和参与较多)。该评分在图 22-6 中进行了说明。
Figure 22-6. Scale for the amount of control 图 22-6. 控制量的规模
Applying this sort of scaling is not exact, of course, but it does help in determining the relative control to exert on a team. For example, consider the project scenario shown in Table 22-1 and Figure 22-7. As shown in the table, the factors point to either a control freak (+20)(+20) or an armchair architect ( -20 ). These factors add up and to an accumulated score of -60 , indicating that the architect should play more of an armchair architect role and not get in the team’s way. 当然,应用这种规模化并不精确,但确实有助于确定对团队施加的相对控制。例如,考虑表 22-1 和图 22-7 中显示的项目场景。如表中所示,因素指向控制狂 (+20)(+20) 或沙发建筑师(-20)。这些因素加起来的总分为-60,表明建筑师应该更多地扮演沙发建筑师的角色,而不是干扰团队。
Table 22-1. Scenario 1 example for amount of control 表 22-1. 场景 1 控制量示例
Factor 因子
Value 值
Rating 评分
Personality 个性
Team familiarity 团队熟悉度
New team members 新团队成员
+20
Control freak 控制狂
Team size 团队规模
Small (4 members) 小型(4 个成员)
-20
Armchair architect 扶手椅架构师
Overall experience 整体体验
All experienced 所有经验丰富的
-20
Armchair architect 扶手椅架构师
Project complexity 项目复杂性
Relatively simple 相对简单
-20
Armchair architect 扶手椅架构师
Project duration 项目持续时间
2 months 2 个月
-20
Armchair architect 扶手椅架构师
Accumulated score 累积得分
-60
Armchair architect 扶手椅架构师
Factor Value Rating Personality
Team familiarity New team members +20 Control freak
Team size Small (4 members) -20 Armchair architect
Overall experience All experienced -20 Armchair architect
Project complexity Relatively simple -20 Armchair architect
Project duration 2 months -20 Armchair architect
Accumulated score -60 Armchair architect| Factor | Value | Rating | Personality |
| :--- | :--- | :--- | :--- |
| Team familiarity | New team members | +20 | Control freak |
| Team size | Small (4 members) | -20 | Armchair architect |
| Overall experience | All experienced | -20 | Armchair architect |
| Project complexity | Relatively simple | -20 | Armchair architect |
| Project duration | 2 months | -20 | Armchair architect |
| Accumulated score | | -60 | Armchair architect |
Figure 22-7. Amount of control for scenario 1 图 22-7. 场景 1 的控制量
In scenario 1, these factors are all taken into account to demonstrate that an effective software architect should initially play the role of facilitator and not get too involved in the day-to-day interactions with the team. The architect will be needed for answering questions and to make sure the team is on track, but for the most part the architect should be largely hands-off and let the experienced team do what they know best -develop software quickly. 在场景 1 中,这些因素都被考虑在内,以证明一个有效的软件架构师应该最初扮演促进者的角色,而不是过多参与团队的日常互动。架构师需要回答问题,并确保团队在正确的轨道上,但在大多数情况下,架构师应该保持相对不干预,让经验丰富的团队做他们最擅长的事情——快速开发软件。
Consider another type of scenario described in Table 22-2 and illustrated in Figure 22-8, where the team members know each other well, but the team is large (12 team members) and consists mostly of junior (inexperienced) developers. The project is relatively complex with a duration of six months. In this case, the accumulated score comes out to -20 , indicating that the effective architect should be involved in the day-to-day activities within the team and take on a mentoring and coaching role, but not so much as to disrupt the team. 考虑表 22-2 中描述的另一种场景,并在图 22-8 中进行了说明,在这种情况下,团队成员彼此非常熟悉,但团队规模较大(12 名团队成员),且大多数是初级(经验不足)开发人员。该项目相对复杂,持续时间为六个月。在这种情况下,累计得分为-20,表明有效的架构师应该参与团队的日常活动,并担任指导和辅导的角色,但不应过于干扰团队。
Table 22-2. Scenario 2 example for amount of control 表 22-2. 场景 2 控制量示例
Factor 因子
Value 值
Rating 评分
Personality 个性
Team familiarity 团队熟悉度
Know each other well 彼此了解得很好
-20
Armchair architect 扶手椅架构师
Team size 团队规模
Large (12 members) 大型(12 个成员)
+20
Control freak 控制狂
Overall experience 整体体验
Mostly junior 大多数是初级的
+20
Control freak 控制狂
Project complexity 项目复杂性
High complexity 高复杂性
+20
Control freak 控制狂
Project duration 项目持续时间
6 months 6 个月
-20
Armchair architect 扶手椅架构师
Accumulated score 累积得分
-20
Control freak 控制狂
Factor Value Rating Personality
Team familiarity Know each other well -20 Armchair architect
Team size Large (12 members) +20 Control freak
Overall experience Mostly junior +20 Control freak
Project complexity High complexity +20 Control freak
Project duration 6 months -20 Armchair architect
Accumulated score -20 Control freak| Factor | Value | Rating | Personality |
| :--- | :--- | :--- | :--- |
| Team familiarity | Know each other well | -20 | Armchair architect |
| Team size | Large (12 members) | +20 | Control freak |
| Overall experience | Mostly junior | +20 | Control freak |
| Project complexity | High complexity | +20 | Control freak |
| Project duration | 6 months | -20 | Armchair architect |
| Accumulated score | | -20 | Control freak |
Figure 22-8. Amount of control for scenario 2 图 22-8. 场景 2 的控制量
It is difficult to objectify these factors, as some of them (such as the overall team experience) might be more weighted than others. In these cases the metrics can easily be weighted or modified to suit any particular scenario or condition. Regardless, the primary message here is that the amount of control and involvement a software architect has on the team varies by these five main factors and that by taking these factors into account, an architect can gauge what sort of control to exert on the team and what the box in which development teams can work in should look like (tight boundaries and constraints or loose ones). 很难将这些因素客观化,因为其中一些(例如整体团队经验)可能比其他因素更重要。在这些情况下,指标可以很容易地加权或修改,以适应任何特定的场景或条件。无论如何,这里主要的信息是,软件架构师对团队的控制和参与程度因这五个主要因素而异,通过考虑这些因素,架构师可以评估对团队施加何种控制,以及开发团队可以工作的框架应该是什么样的(紧密的边界和约束或宽松的)。
Team Warning Signs 团队警告信号
As indicated in the prior section, team size is one of the factors that influence the amount of control an architect should exert on a development team. The larger a team, the more control needed; the smaller the team, the less control needed. Three factors come into play when considering the most effective development team size: 如前一节所述,团队规模是影响架构师对开发团队施加控制程度的因素之一。团队越大,需要的控制越多;团队越小,需要的控制越少。在考虑最有效的开发团队规模时,有三个因素需要考虑:
Process loss 过程损失
Pluralistic ignorance 多元无知
Diffusion of responsibility 责任扩散
Process loss, otherwise known as Brooks law, was originally coined by Fred Brooks in his book The Mythical Man Month (Addison-Wesley). The basic idea of process loss is that the more people you add to a project, the more time the project will take. As illustrated in Figure 22-9, the group potential is defined by the collective efforts of everyone on the team. However, with any team, the actual productivity will always be less than the group potential, the difference being the process loss of the team. 过程损失,也称为布鲁克斯定律,最初由弗雷德·布鲁克斯在他的书《神话般的工人月》(Addison-Wesley)中提出。过程损失的基本思想是,向一个项目中添加更多的人,项目所需的时间就会增加。如图 22-9 所示,团队的潜力由团队中每个人的集体努力定义。然而,对于任何团队,实际生产力总是低于团队的潜力,两者之间的差异就是团队的过程损失。
Figure 22-9. Team size impacts actual productivity (Brook’s law) 图 22-9. 团队规模影响实际生产力(布鲁克定律)
An effective software architect will observe the development team and look for process loss. Process loss is a good factor in determining the correct team size for a particular project or effort. One indication of process loss is frequent merge conflicts when pushing code to a repository. This is an indication that team members are possibly stepping on each other’s toes and working on the same code. Looking for areas of parallelism within the team and having team members working on separate services or areas of the application is one way to avoid process loss. Anytime a new team member comes on board a project, if there aren’t areas for creating parallel work streams, an effective architect will question the reason why a new team member was added to the team and demonstrate to the project manager the negative impact that additional person will have on the team. 一个有效的软件架构师会观察开发团队并寻找过程损失。过程损失是确定特定项目或工作的正确团队规模的一个重要因素。频繁的合并冲突是过程损失的一个迹象,当将代码推送到代码库时,这表明团队成员可能在相互干扰并在同一代码上工作。寻找团队内的并行工作领域,并让团队成员在不同的服务或应用程序区域工作,是避免过程损失的一种方法。每当有新团队成员加入项目时,如果没有创建并行工作流的领域,有效的架构师会质疑为什么要将新团队成员添加到团队中,并向项目经理展示这个额外人员对团队的负面影响。
Pluralistic ignorance also occurs as the team size gets too big. Pluralistic ignorance is when everyone agrees to (but privately rejects) a norm because they think they are missing something obvious. For example, suppose on a large team the majority agree that using messaging between two remote services is the best solution. However, one person on the team thinks this is a silly idea because of a secure firewall between the two services. However, rather than speak up, that person also agrees to the use of messaging (but privately rejects the idea) because they are afraid that they are either missing something obvious or afraid they might be seen as a fool if they were to speak up. In this case, the person rejecting the norm was correct-messaging would not work because of a secure firewall between the two remote services. Had they spoken up (and had the team size been smaller), the original solution would have been challenged and another protocol (such as REST) used instead, which would be a better solution in this case. 当团队规模过大时,群体无知现象也会发生。群体无知是指每个人都同意(但私下拒绝)某种规范,因为他们认为自己错过了某些显而易见的东西。例如,假设在一个大型团队中,大多数人认为在两个远程服务之间使用消息传递是最佳解决方案。然而,团队中的一个人认为这是个愚蠢的主意,因为两个服务之间有一个安全防火墙。然而,那个不愿发言的人也同意使用消息传递(但私下拒绝这个想法),因为他们害怕自己错过了显而易见的东西,或者害怕如果发言会被视为傻瓜。在这种情况下,拒绝规范的人是正确的——由于两个远程服务之间有一个安全防火墙,消息传递是行不通的。如果他们发言(并且团队规模较小),原始解决方案就会受到挑战,可能会使用另一种协议(例如 REST),在这种情况下这将是一个更好的解决方案。
The concept of pluralistic ignorance was made famous by the Danish children’s story “The Emperor’s New Clothes”, by Hans Christian Andersen. In the story, the king is convinced that his new clothes are invisible to anyone unworthy to actually see them. He struts around totally nude, asking all of his subjects how they like his new clothes. All the subjects, afraid of being considered stupid or unworthy, respond to the king 多元无知的概念因丹麦儿童故事《皇帝的新装》而闻名,该故事由汉斯·克里斯蒂安·安徒生创作。在故事中,国王相信他的新衣服对任何不配看到它们的人都是隐形的。他完全赤裸地走来走去,询问所有臣民对他的新衣服的看法。所有臣民都害怕被认为愚蠢或不配,因此回应国王。
that his new clothes are the best thing ever. This folly continues until a child finally calls out to the king that he isn’t wearing any clothes at all. 他的新衣服是有史以来最好的东西。这种愚蠢的行为持续着,直到一个孩子终于对国王喊道,他根本没有穿衣服。
An effective software architect should continually observe facial expressions and body language during any sort of collaborative meeting or discussion and act as a facilitator if they sense an occurrence of pluralistic ignorance. In this case, the effective architect might interrupt and ask the person what they think about the proposed solution and be on their side and support them when they speak up. 一个有效的软件架构师应该在任何形式的协作会议或讨论中持续观察面部表情和身体语言,并在感知到多元无知的发生时充当促进者。在这种情况下,有效的架构师可能会打断并询问对方对提议解决方案的看法,并在他们发言时支持他们。
The third factor that indicates appropriate team size is called diffusion of responsibility. Diffusion of responsibility is based on the fact that as team size increases, it has a negative impact on communication. Confusion about who is responsible for what on the team and things getting dropped are good signs of a team that is too large. 第三个表明适当团队规模的因素称为责任扩散。责任扩散基于这样一个事实:随着团队规模的增加,沟通会受到负面影响。关于团队中谁负责什么的混淆以及事情被忽视是团队过大的良好迹象。
Look at the picture in Figure 22-10. What do you observe? 请看图 22-10 中的图片。你观察到了什么?
Figure 22-10. Diffusion of responsibility 图 22-10. 责任的扩散
This picture shows someone standing next to a broken-down car on the side of a small country road. In this scenario, how many people might stop and ask the motorist if everything is OK? Because it’s a small road in a small community, probably everyone who passes by. However, how many times have motorists been stuck on the side of a busy highway in the middle of a large city and had thousands of cars simply drive by without anyone stopping and asking if everything is OK? All the time. This is a good example of the diffusion of responsibility. As cities get busier and more crowded, people assume the motorist has already called or help is on the way due to the large number of people witnessing the event. However, in most of these cases help is not on the way, and the motorist is stuck with a dead or forgotten cell phone, unable to call for help. 这张图片显示了一个人站在一辆坏掉的车旁,位于一条小乡村道路的旁边。在这种情况下,有多少人可能会停下来问问司机一切是否正常?因为这是一个小路,位于一个小社区,可能路过的每个人都会停下来。然而,有多少次司机在繁忙的高速公路上被困在大城市的边缘,却有成千上万的汽车驶过,没有人停下来问一切是否正常?这种情况时常发生。这是责任扩散的一个好例子。随着城市变得越来越繁忙和拥挤,人们假设司机已经打过电话,或者由于目击事件的人数众多,帮助正在赶来的路上。然而,在大多数情况下,帮助并没有在路上,司机却被困在一个没电或被遗忘的手机旁,无法呼叫帮助。
An effective architect not only helps guide the development team through the implementation of the architecture, but also ensures that the team is healthy, happy, and working together to achieve a common goal. Looking for these three warning signs and consequently helping to correct them helps to ensure an effective development team. 一个有效的架构师不仅帮助指导开发团队实施架构,还确保团队健康、快乐,并共同努力实现共同目标。寻找这三个警告信号并相应地帮助纠正它们,有助于确保一个有效的开发团队。
Leveraging Checklists 利用检查清单
Airline pilots use checklists on every flight-even the most experienced, seasoned veteran pilots. Pilots have checklists for takeoff, landing, and thousands of other situations, both common and unusual edge cases. They use checklists because one missed aircraft setting (such as setting the flaps to 10 degrees) or procedure (such as gaining clearance into a terminal control area) can mean the difference between a safe flight and a disastrous one. 航空公司飞行员在每次飞行中都使用检查清单——即使是最有经验的老练飞行员也是如此。飞行员有起飞、着陆以及成千上万其他常见和不寻常边缘情况的检查清单。他们使用检查清单是因为错过一个飞机设置(例如将襟翼设置为 10 度)或程序(例如获得进入终端控制区的许可)可能意味着安全飞行和灾难性飞行之间的区别。
Dr. Atul Gawande wrote an excellent book called The Checklist Manifesto (Picador), in which he describes the power of checklists for surgical procedures. Alarmed at the high rate of staph infections in hospitals, Dr. Gawande created surgical checklists to attempt to reduce this rate. In the book he demonstrates that staph infection rates in hospitals using the checklists went down to near zero, while staph infection rates in control hospitals not using the checklists continued to rise. 阿图尔·高瓦德博士写了一本名为《检查清单宣言》(Picador)的优秀书籍,在书中他描述了检查清单在外科手术中的重要性。由于对医院中金黄色葡萄球菌感染率高的担忧,高瓦德博士创建了外科检查清单,以试图降低这一比例。在书中,他展示了使用检查清单的医院中金黄色葡萄球菌感染率降至接近零,而未使用检查清单的对照医院中金黄色葡萄球菌感染率则持续上升。
Checklists work. They provide an excellent vehicle for making sure everything is covered and addressed. If checklists work so well, then why doesn’t the software development industry leverage them? We firmly believe through personal experience that checklists make a big difference in the effectiveness of development teams. However, there are caveats to this claim. First, most software developers are not flying airliners or performing open heart surgery. In other words, software developers don’t require checklists for everything. The key to making teams effective is knowing when to leverage checklists and when not to. 清单有效。它们提供了一个很好的工具,以确保所有内容都得到覆盖和处理。如果清单如此有效,那么为什么软件开发行业不利用它们呢?我们通过个人经验坚信,清单在开发团队的有效性上有很大差异。然而,这一说法有一些警告。首先,大多数软件开发人员并不是在驾驶大型客机或进行心脏手术。换句话说,软件开发人员并不需要对所有事情都使用清单。使团队有效的关键在于知道何时利用清单,何时不利用。
Consider the checklist shown in Figure 22-11 for creating a new database table. 请参考图 22-11 中显示的创建新数据库表的检查清单。
Done 完成
Task description 任务描述
◻\square
Determine database column field names and types 确定数据库列字段名称和类型
◻\square
Fill out database table request form 填写数据库表请求表单
◻\square
Obtain permission for new database table 获取新数据库表的权限
◻\square
Submit request form to database group 将请求表单提交给数据库组
◻\square
Verify table once created 创建后验证表格
Done Task description
◻ Determine database column field names and types
◻ Fill out database table request form
◻ Obtain permission for new database table
◻ Submit request form to database group
◻ Verify table once created| Done | Task description |
| :---: | :--- |
| $\square$ | Determine database column field names and types |
| $\square$ | Fill out database table request form |
| $\square$ | Obtain permission for new database table |
| $\square$ | Submit request form to database group |
| $\square$ | Verify table once created |
Figure 22-11. Example of a bad checklist 图 22-11. 不良检查表示例
This is not a checklist, but a set of procedural steps, and as such should not be in a checklist. For example, the database table cannot be verified if the form has not yet been submitted! Any processes that have a procedural flow of dependent tasks should not be in a checklist. Simple, well-known processes that are executed frequently without error also do not need a checklist. 这不是一个检查清单,而是一组程序步骤,因此不应放在检查清单中。例如,如果表单尚未提交,则无法验证数据库表!任何具有依赖任务程序流程的过程都不应放在检查清单中。简单、众所周知的、频繁执行且没有错误的过程也不需要检查清单。
Processes that are good candidates for checklists are those that don’t have any procedural order or dependent tasks, as well as those that tend to be error-prone or have steps that are frequently missed or skipped. The key to making checklists effective is to not go overboard making everything a checklist. Architects find that checklists do, in fact, make development teams more effective, and as such start to make everything a checklist, invoking what is known as the law of diminishing returns. The more checklists an architect creates, the less chance developers will use them. Another key success factor when creating checklists is to make them as small as possible while still capturing all the necessary steps within a process. Developers generally will not follow checklists that are too big. Seek items that can be performed through automation and remove those from the checklist. 适合使用检查表的流程是那些没有任何程序顺序或依赖任务的流程,以及那些容易出错或经常遗漏或跳过步骤的流程。使检查表有效的关键是不要过度将所有内容都变成检查表。架构师发现,检查表确实使开发团队更有效,因此开始将所有内容都变成检查表,这引发了所谓的收益递减法则。架构师创建的检查表越多,开发人员使用它们的机会就越少。创建检查表时另一个关键成功因素是尽可能将其缩小,同时仍能捕捉到流程中的所有必要步骤。开发人员通常不会遵循过大的检查表。寻找可以通过自动化执行的项目,并将其从检查表中移除。
Don’t worry about stating the obvious in a checklist. It’s the obvious stuff that’s usually skipped or missed. 不要担心在清单中陈述显而易见的内容。通常被跳过或遗漏的正是这些显而易见的东西。
Three key checklists that we’ve found to be most effective are a developer code completion checklist, a unit and functional testing checklist, and a software release checklist. Each checklist is discussed in the following sections. 我们发现最有效的三个关键检查清单是开发者代码完成检查清单、单元和功能测试检查清单,以及软件发布检查清单。每个检查清单将在以下部分中讨论。
The Hawthorne Effect 霍桑效应
One of the issues associated with introducing checklists to a development team is making developers actually use them. It’s all too common for some developers to run out of time and simply mark all the items in a particular checklist as completed without having actually performed the tasks. 引入检查清单到开发团队的一个问题是让开发人员真正使用它们。某些开发人员常常因为时间不够而简单地将特定检查清单中的所有项目标记为已完成,而实际上并没有执行这些任务。
One of the ways of addressing this issue is by talking with the team about the importance of using checklists and how checklists can make a difference in the team. Have team members read The Checklist Manifesto by Atul Gawande to fully understand the power of a checklist, and make sure each team member understands the reasoning behind each checklist and why it is being used. Having developers collaborate on what should and shouldn’t be on a checklist also helps. 解决这个问题的一种方法是与团队讨论使用检查清单的重要性,以及检查清单如何对团队产生影响。让团队成员阅读 Atul Gawande 的《检查清单宣言》,以充分理解检查清单的力量,并确保每个团队成员理解每个检查清单背后的理由以及为什么要使用它。让开发人员共同协作确定检查清单上应该和不应该包含的内容也很有帮助。
When all else fails, architects can invoke what is known as the Hawthorne effect. The Hawthorne effect essentially means that if people know they are being observed or monitored, their behavior changes, and generally they will do the right thing. Examples include highly visible cameras in and around buildings that actually don’t work or aren’t really recording anything (this is very common!) and website monitoring software (how many of those reports are actually viewed?). 当其他方法都失败时,架构师可以引用所谓的霍桑效应。霍桑效应基本上意味着,如果人们知道自己正在被观察或监控,他们的行为会发生变化,通常他们会做正确的事情。例子包括在建筑物内外非常显眼的摄像头,这些摄像头实际上并不工作或并没有真正录制任何内容(这非常常见!)以及网站监控软件(有多少这样的报告实际上被查看?)。
The Hawthorne effect can be used to govern the use of checklists as well. An architect can let the team know that the use of checklists is critical to the team’s effectiveness, and as a result, all checklists will be verified to make sure the task was actually performed, when in fact the architect is only occasionally spot-checking the checklists for correctness. By leveraging the Hawthorne effect, developers will be much less likely to skip items or mark them as completed when in fact the task was not done. 霍桑效应也可以用来管理检查表的使用。建筑师可以让团队知道,使用检查表对团队的有效性至关重要,因此,所有检查表都将被验证,以确保任务确实已执行,而实际上建筑师只是偶尔抽查检查表的正确性。通过利用霍桑效应,开发人员将更不可能跳过项目或将其标记为已完成,而实际上任务并未完成。
Developer Code Completion Checklist 开发者代码完成检查清单
The developer code completion checklist is an effective tool to use, particularly when a software developer states that they are “done” with the code. It also is useful for defining what is known as the “definition of done.” If everything in the checklist is completed, then the developer can claim they are actually done with the code. 开发者代码完成检查清单是一个有效的工具,特别是在软件开发者声称他们“完成”代码时。它对于定义所谓的“完成定义”也很有用。如果检查清单中的所有内容都已完成,那么开发者可以声称他们实际上已经完成了代码。
Here are some of the things to include in a developer code completion checklist: 以下是开发者代码补全检查清单中应包含的一些内容:
Coding and formatting standards not included in automated tools 未包含在自动化工具中的编码和格式标准
Frequently overlooked items (such as absorbed exceptions) 经常被忽视的项目(例如吸收的异常)
Project-specific standards 项目特定标准
Special team instructions or procedures 特殊团队指示或程序
Figure 22-12 illustrates an example of a developer code completion checklist. 图 22-12 展示了一个开发者代码完成检查表的示例。
Done 完成
Task description 任务描述
◻\square
Run code cleanup and code formatting 运行代码清理和代码格式化
◻\square
Execute custom source validation tool 执行自定义源验证工具
◻\square
Verify the audit log is written for all updates 验证审计日志是否为所有更新写入
◻\square
Make sure there are no absorbed exceptions 确保没有被吸收的异常
◻\square
Check for hardcoded values and convert to constants 检查硬编码值并转换为常量
◻\square
Verify that only public methods are calling setFailure() 验证只有公共方法在调用 setFailure()
◻\square
Include @ServiceEntrypoint on service API class 在服务 API 类上包含 @ServiceEntrypoint
Done Task description
◻ Run code cleanup and code formatting
◻ Execute custom source validation tool
◻ Verify the audit log is written for all updates
◻ Make sure there are no absorbed exceptions
◻ Check for hardcoded values and convert to constants
◻ Verify that only public methods are calling setFailure()
◻ Include @ServiceEntrypoint on service API class| Done | Task description |
| :---: | :--- |
| $\square$ | Run code cleanup and code formatting |
| $\square$ | Execute custom source validation tool |
| $\square$ | Verify the audit log is written for all updates |
| $\square$ | Make sure there are no absorbed exceptions |
| $\square$ | Check for hardcoded values and convert to constants |
| $\square$ | Verify that only public methods are calling setFailure() |
| $\square$ | Include @ServiceEntrypoint on service API class |
Figure 22-12. Example of a developer code completion checklist 图 22-12. 开发者代码完成检查清单示例
Notice the obvious tasks “Run code cleanup and code formatting” and “Make sure there are no absorbed exceptions” in the checklist. How may times has a developer been in a hurry either at the end of the day or at the end of an iteration and forgotten to run code cleanup and formatting from the IDE? Plenty of times. In The Checklist Manifesto, Gawande found this same phenomenon with respect to surgical proce-dures-the obvious ones were often the ones that were usually missed. 注意清单中明显的任务“运行代码清理和代码格式化”和“确保没有吸收的异常”。开发人员在一天结束或迭代结束时匆忙,忘记从 IDE 中运行代码清理和格式化的情况有多少次?很多次。在《清单宣言》中,Gawande 发现了与外科手术程序相关的同样现象——明显的步骤往往是通常被遗漏的。
Notice also the project-specific tasks in items 2, 3, 6, and 7. While these are good items to have in a checklist, an architect should always review the checklist to see if any items can be automated or written as plug-in for a code validation checker. For example, while “Include @ServiceEntrypoint on service API class” might not be able to have an automated check, the “Verify that only public methods are calling setFailure()” certainly can (this is a straightforward automated check with any sort of code crawling tool). Checking for areas of automation helps reduce both the size and the noise within a checklist, making it more effective. 请注意第 2、3、6 和 7 项中的项目特定任务。虽然这些都是清单中很好的项目,但架构师应始终审查清单,以查看是否有任何项目可以自动化或编写为代码验证检查器的插件。例如,虽然“在服务 API 类上包含@ServiceEntrypoint”可能无法进行自动检查,但“验证只有公共方法调用 setFailure()”肯定可以(这是使用任何类型的代码爬虫工具进行的简单自动检查)。检查自动化领域有助于减少清单中的大小和噪音,使其更有效。
Unit and Functional Testing Checklist 单元和功能测试清单
Perhaps one of the most effective checklists is a unit and functional testing checklist. This checklist contains some of the more unusual and edge-case tests that software developers tend to forget to test. Whenever someone from QA finds an issue with the code based on a particular test case, that test case should be added to this checklist. 也许最有效的检查清单之一是单元和功能测试检查清单。这个检查清单包含了一些软件开发人员往往会忘记测试的较为不寻常和边缘情况的测试。每当质量保证团队的某个人根据特定的测试用例发现代码存在问题时,该测试用例应被添加到这个检查清单中。
This particular checklist is usually one of the largest ones due to all the types of tests that can be run against code. The purpose of this checklist is to ensure the most complete coding possible so that when the developer is done with the checklist, the code is essentially production ready. 这个特定的检查清单通常是最大的之一,因为可以对代码运行的测试类型很多。这个检查清单的目的是确保尽可能完整的编码,以便当开发人员完成检查清单时,代码基本上是准备好投入生产的。
Here are some of the items found in a typical unit and functional testing checklist: 以下是典型单元和功能测试检查表中发现的一些项目:
Special characters in text and numeric fields 文本和数字字段中的特殊字符
Minimum and maximum value ranges 最小值和最大值范围
Unusual and extreme test cases 不寻常和极端的测试用例
Missing fields 缺失字段
Like the developer code completion checklist, any items that can be written as automated tests should be removed from the checklist. For example, suppose there is an item in the checklist for a stock trading application to test for negative shares (such as a BUY for -1,000-1,000 shares of Apple [AAPL]). If this check is automated through a unit or functional test within the test suite, then the item should be removed from the checklist. 像开发者代码完成检查表一样,任何可以作为自动化测试编写的项目都应从检查表中删除。例如,假设检查表中有一个针对股票交易应用程序的项目,用于测试负股数(例如,购买 -1,000-1,000 股苹果 [AAPL])。如果通过测试套件中的单元测试或功能测试自动化了此检查,则该项目应从检查表中删除。
Developers sometimes don’t know where to start when writing unit tests or how many unit tests to write. This checklist provides a way of making sure general or specific test scenarios are included in the process of developing the software. This checklist is also effective in bridging the gap between developers and testers in environments that have these activities performed by separate teams. The more development teams perform complete testing, the easier the job of the testing teams, allowing the testing teams to focus on certain business scenarios not covered in the checklists. 开发人员有时不知道在编写单元测试时从何开始或应该编写多少单元测试。这个检查清单提供了一种确保在软件开发过程中包含一般或特定测试场景的方法。这个检查清单在开发人员和测试人员之间架起了桥梁,特别是在这些活动由不同团队执行的环境中。开发团队进行全面测试的次数越多,测试团队的工作就越轻松,从而使测试团队能够专注于检查清单中未涵盖的特定业务场景。
Software Release Checklist 软件发布检查清单
Releasing software into production is perhaps one of the most error-prone aspects of the software development life cycle, and as such makes for a great checklist. This checklist helps avoid failed builds and failed deployments, and it significantly reduces the amount of risk associated with releasing software. 将软件发布到生产环境中可能是软件开发生命周期中最容易出错的方面之一,因此它成为了一个很好的检查清单。这个检查清单有助于避免构建失败和部署失败,并显著降低与软件发布相关的风险。
The software release checklist is usually the most volatile of the checklists in that it continually changes to address new errors and circumstances each time a deployment fails or has issues. 软件发布检查清单通常是所有检查清单中变化最频繁的,因为它会不断变化以应对每次部署失败或出现问题时的新错误和情况。
Here are some of the items typically included within the software release checklist: 以下是软件发布检查清单中通常包含的一些项目:
Configuration changes in servers or external configuration servers 服务器或外部配置服务器中的配置更改
Third-party libraries added to the project (JAR, DLL, etc.) 添加到项目中的第三方库(JAR、DLL 等)
Database updates and corresponding database migration scripts 数据库更新和相应的数据库迁移脚本
Anytime a build or deployment fails, the architect should analyze the root cause of the failure and add a corresponding entry to the software release checklist. This way the item will be verified on the next build or deployment, preventing that issue from happening again. 每当构建或部署失败时,架构师应分析失败的根本原因,并在软件发布检查表中添加相应的条目。这样,该项目将在下一个构建或部署中得到验证,从而防止该问题再次发生。
Providing Guidance 提供指导
A software architect can also make teams effective by providing guidance through the use of design principles. This also helps form the box (constraints), as described in the first section of this chapter, that developers can work in to implement the architecture. Effectively communicating these design principles is one of the keys to creating a successful team. 软件架构师还可以通过提供设计原则的指导来提高团队的效率。这也有助于形成本章第一部分所描述的框架(约束),开发人员可以在其中工作以实现架构。有效地传达这些设计原则是创建成功团队的关键之一。
To illustrate this point, consider providing guidance to a development team regarding the use of what is typically called the layered stack-the collection of third-party libraries (such as JAR files, and DLLs) that make up the application. Development teams usually have lots of questions regarding the layered stack, including whether they can make their own decisions about various libraries, which ones are OK, and which ones are not. 为了说明这一点,可以考虑为开发团队提供关于通常称为分层堆栈的使用指导——构成应用程序的第三方库的集合(如 JAR 文件和 DLL)。开发团队通常对分层堆栈有很多问题,包括他们是否可以自行决定使用哪些库,哪些是可以的,哪些是不可以的。
Using this example, an effective software architect can provide guidance to the development team by first having the developer answer the following questions: 通过这个例子,一个有效的软件架构师可以通过让开发人员首先回答以下问题来为开发团队提供指导:
Are there any overlaps between the proposed library and existing functionality within the system? 提议的库与系统内现有功能之间是否存在重叠?
What is the justification for the proposed library? 提议的库的理由是什么?
The first question guides developers to looking at the existing libraries to see if the functionality provided by the new library can be satisfied through an existing library or existing functionality. It has been our experience that developers sometimes ignore this activity, creating lots of duplicate functionality, particularly in large projects with large teams. 第一个问题引导开发人员查看现有库,以确定新库提供的功能是否可以通过现有库或现有功能来满足。我们的经验是,开发人员有时会忽视这一活动,导致大量重复功能,特别是在大型项目和大型团队中。
The second question prompts the developer into questioning why the new library or functionality is truly needed. Here, an effective software architect will ask for both a technical justification as well as a business justification as to why the additional library is needed. This can be a powerful technique to create awareness within the development team of the need for business justifications. 第二个问题促使开发者质疑为什么真的需要新的库或功能。在这里,一个有效的软件架构师会要求提供技术上的理由以及商业上的理由,说明为什么需要额外的库。这可以成为一种强有力的技术,提升开发团队对商业理由需求的意识。
The Impact of Business Justifications 商业理由的影响
One of your authors (Mark) was the lead architect on a particularly complex Javabased project with a large development team. One of the team members was particularly obsessed with the Scala programming language and desperately wanted to use it on the project. This desire for the use of Scala ended up becoming so disruptive that several key team members informed Mark that they were planning on leaving the project and moving on to other, “less toxic,” environments. Mark convinced the two key team members to hold off on their decision for a bit and had a discussion with the Scala enthusiast. Mark told the Scala enthusiast that he would support the use of Scala within the project, but the Scala enthusiast would have to provide a business justification for the use of Scala because of the training costs and rewriting effort involved. The Scala enthusiast was ecstatic and said he would get right on it, and he left the meeting yelling, “Thank you-you’re the best!” 你们的作者之一(Mark)是一个特别复杂的基于 Java 的项目的首席架构师,该项目有一个大型开发团队。团队中的一名成员对 Scala 编程语言特别痴迷,并迫切希望在项目中使用它。这种对 Scala 使用的渴望最终变得如此具有破坏性,以至于几位关键团队成员告知 Mark,他们计划离开该项目,转向其他“更少毒性”的环境。Mark 说服了两位关键团队成员暂时搁置他们的决定,并与 Scala 爱好者进行了讨论。Mark 告诉 Scala 爱好者,他会支持在项目中使用 Scala,但 Scala 爱好者必须提供使用 Scala 的商业理由,因为涉及到培训成本和重写工作。Scala 爱好者欣喜若狂地说他会立即着手处理,并在离开会议时大喊:“谢谢你——你是最棒的!”
The next day the Scala enthusiast came into the office completely transformed. He immediately approached Mark and asked to speak with him. They both went into the conference room, and the Scala enthusiast immediately (and humbly) said, “Thank you.” The Scala enthusiast explained to Mark that he could come up with all the technical reasons in the world to use Scala, but none of those technical advantages had any sort of business value in terms of the architecture characteristics needed ("-ilities"): cost, budget, and timeline. In fact, the Scala enthusiast realized that the increase in cost, budget, and timeline would provide no benefit whatsoever. 第二天,Scala 爱好者走进办公室,完全变了一个人。他立刻走向 Mark,请求与他交谈。他们一起进入会议室,Scala 爱好者立刻(并谦逊地)说:“谢谢。”Scala 爱好者向 Mark 解释说,他可以提出所有使用 Scala 的技术理由,但这些技术优势在所需的架构特性(“-ilities”)方面并没有任何商业价值:成本、预算和时间表。事实上,Scala 爱好者意识到,成本、预算和时间表的增加根本不会带来任何好处。
Realizing what a disruption he was, the Scala enthusiast quickly transformed himself into one of the best and most helpful members on the team, all because of being asked to provide a business justification for something he wanted on the project. This increased awareness of justifications not only made him a better software developer, but also made for a stronger and healthier team. 意识到自己造成的干扰,这位 Scala 爱好者迅速转变为团队中最优秀和最乐于助人的成员之一,这一切都源于被要求为他在项目中想要的东西提供商业理由。这种对理由的增强意识不仅使他成为了更好的软件开发者,也使团队变得更强大和更健康。
As a postscript, the two key developers stayed on the project until the very end. 作为附言,这两位关键开发者一直留在项目中直到最后。
Continuing with the example of governing the layered stack, another effective technique of communicating design principles is through graphical explanations about what the development team can make decisions on and what they can’t. The illustration in Figure 22-13 is an example of what this graphic (as well as the guidance) might look like for controlling the layered stack. 继续以管理分层堆栈的例子,传达设计原则的另一种有效技术是通过图形解释开发团队可以做出决策的内容以及他们不能做的内容。图 22-13 中的插图是控制分层堆栈的图形(以及指导)可能的样子。
Figure 22-13. Providing guidance for the layered stack 图 22-13. 为分层堆栈提供指导
In Figure 22-13, an architect would provide examples of what each category of the third-party library would contain and then what the guidance is (the design principle) in terms of what the developers can and can’t do (the box described in the first section of the chapter). For example, here are the three categories defined for any third-party library: 在图 22-13 中,架构师将提供每个第三方库类别所包含的示例,以及在开发人员可以和不能做的方面的指导(本章第一部分描述的框)。例如,以下是为任何第三方库定义的三个类别:
Special purpose 特殊目的
These are specific libraries used for things like PDF rendering, bar code scanning, and circumstances that do not warrant writing custom software. 这些是用于 PDF 渲染、条形码扫描以及不需要编写自定义软件的情况的特定库。
General purpose 通用目的
These libraries are wrappers on top of the language API, and they include things like Apache Commons, and Guava for Java. 这些库是语言 API 之上的封装,包括 Apache Commons 和 Guava for Java 等内容。
Framework 框架
These libraries are used for things like persistence (such as Hibernate) and inversion of control (such as Spring). In other words, these libraries make up an entire layer or structure of the application and are highly invasive. 这些库用于持久性(例如 Hibernate)和控制反转(例如 Spring)等功能。换句话说,这些库构成了应用程序的整个层或结构,并且具有很强的侵入性。
Once categorized (the preceding categories are only an example-there can be many more defined), the architect then creates the box around this design principle. Notice in the example illustrated in Figure 22-13 that for this particular application or project, the architect has specified that for special-purpose libraries, the developer 一旦分类(前面的类别只是一个示例——可以定义更多类别),架构师就会围绕这个设计原则创建框。请注意,在图 22-13 中所示的示例中,对于这个特定的应用程序或项目,架构师已指定对于特殊用途的库,开发者
can make the decision and does not need to consult the architect for that library. However, notice that for general purpose, the architect has indicated that the developer can undergo overlap analysis and justification to make the recommendation, but that category of library requires architect approval. Finally, for framework libraries, that is an architect decision-in other words, the development teams shouldn’t even undergo analysis for these types of libraries; the architect has decided to take on that responsibility for those types of libraries. 可以做出决定,并且不需要咨询该库的架构师。然而,请注意,对于通用目的,架构师已表明开发人员可以进行重叠分析和论证以做出推荐,但该类别的库需要架构师的批准。最后,对于框架库,这是架构师的决定——换句话说,开发团队甚至不应该对这些类型的库进行分析;架构师已决定承担这些类型库的责任。
Summary 摘要
Making development teams effective is hard work. It requires lots of experience and practice, as well as strong people skills (which we will discuss in subsequent chapters in this book). That said, the simple techniques described in this chapter about elastic leadership, leveraging checklists, and providing guidance through effectively communicating design principles do, in fact, work, and have proven effective in making development teams work smarter and more effectively. 使开发团队高效是一项艰巨的工作。这需要大量的经验和实践,以及强大的人际交往能力(我们将在本书后面的章节中讨论这一点)。尽管如此,本章中关于弹性领导、利用检查清单和通过有效沟通设计原则提供指导的简单技巧确实有效,并已被证明能够使开发团队更聪明、更高效地工作。
One might question the role of an architect for such activities, instead assigning the effort of making teams effective to the development manager or project manager. We strongly disagree with this premise. A software architect not only provides technical guidance to the team, but also leads the team through the implementation of the architecture. The close collaborative relationship between a software architect and a development team allows the architect to observe the team dynamics and hence facilitate changes to make the team more effective. This is exactly what differentiates a technical architect from an effective software architect. 人们可能会质疑架构师在此类活动中的角色,而将提高团队效率的工作分配给开发经理或项目经理。我们对此观点强烈反对。软件架构师不仅为团队提供技术指导,还引导团队实施架构。软件架构师与开发团队之间的紧密合作关系使架构师能够观察团队动态,从而促进改变,使团队更有效。这正是技术架构师与有效软件架构师之间的区别所在。
Negotiation and Leadership Skills 谈判与领导技能
Negotiation and leadership skills are hard skills to obtain. It takes many years of learning, practice, and “lessons learned” experiences to gain the necessary skills to become an effective software architect. Knowing that this book cannot make an architect an expert in negotiation and leadership overnight, the techniques introduced in this chapter provide a good starting point for gaining these important skills. 谈判和领导技能是难以获得的硬技能。需要多年的学习、实践和“经验教训”才能获得成为有效软件架构师所需的技能。考虑到这本书无法让架构师在一夜之间成为谈判和领导方面的专家,本章介绍的技术为获得这些重要技能提供了一个良好的起点。
Negotiation and Facilitation 谈判与促进
In the beginning of this book, we listed the core expectations of an architect, the last being the expectation that a software architect must understand the political climate of the enterprise and be able to navigate the politics. The reason for this key expectation is that almost every decision a software architect makes will be challenged. Decisions will be challenged by developers who think they know more than the architect and hence have a better approach. Decisions will be challenged by other architects within the organization who think they have a better idea or way of approaching the problem. Finally, decisions will be challenged by stakeholders who will argue that the decision is too expensive or will take too much time. 在本书的开头,我们列出了架构师的核心期望,最后一个期望是软件架构师必须理解企业的政治气候并能够应对政治。这一关键期望的原因在于,几乎每一个软件架构师所做的决策都会受到挑战。开发人员会质疑这些决策,他们认为自己比架构师更了解,因此有更好的方法。组织内的其他架构师也会质疑这些决策,他们认为自己有更好的想法或解决问题的方法。最后,利益相关者也会质疑这些决策,他们会争辩说这个决策太昂贵或需要太多时间。
Consider the decision of an architect to use database clustering and federation (using separate physical domain-scoped database instances) to mitigate risk with regard to overall availability within a system. While this is a sound solution to the issue of database availability, it is also a costly decision. In this example, the architect must negotiate with key business stakeholders (those paying for the system) to come to an agreement about the trade-off between availability and cost. 考虑架构师决定使用数据库集群和联邦(使用单独的物理域范围数据库实例)来降低系统整体可用性风险的情况。虽然这是解决数据库可用性问题的合理方案,但这也是一个昂贵的决定。在这个例子中,架构师必须与关键业务利益相关者(为系统付费的那些人)进行协商,以达成关于可用性和成本之间权衡的协议。
Negotiation is one of the most important skills a software architect can have. Effective software architects understand the politics of the organization, have strong 谈判是软件架构师最重要的技能之一。有效的软件架构师理解组织的政治,具备强大的
negotiation and facilitation skills, and can overcome disagreements when they occur to create solutions that all stakeholders agree on. 谈判和促进技能,并能够在出现分歧时克服这些分歧,以创造所有利益相关者都同意的解决方案。
Negotiating with Business Stakeholders 与业务利益相关者谈判
Consider the following real-world scenario (scenario 1) involving a key business stakeholder and lead architect: 考虑以下涉及关键业务利益相关者和首席架构师的现实场景(场景 1):
Scenario 1 场景 1
The senior vice president project sponsor is insistent that the new trading system must support five nines of availability ( 99.999%99.999 \% ). However, the lead architect is convinced, based on research, calculations, and knowledge of the business domain and technology, that three nines of availability ( 99.9%99.9 \% ) would be sufficient. The problem is, the project sponsor does not like to be wrong or corrected and really hates people who are condescending. The sponsor isn’t overly technical (but thinks they are) and as a result tends to get involved in the nonfunctional aspects of projects. The architect must convince the project sponsor through negotiation that three nines ( 99.9%99.9 \% ) of availability would be enough. 高级副总裁项目赞助人坚持认为新的交易系统必须支持五个九的可用性 ( 99.999%99.999 \% )。然而,首席架构师基于研究、计算以及对业务领域和技术的了解,坚信三个九的可用性 ( 99.9%99.9 \% ) 足够。问题在于,项目赞助人不喜欢被纠正或犯错,并且非常讨厌居高临下的人。赞助人并不是特别懂技术(但认为自己懂),因此往往会参与项目的非功能性方面。架构师必须通过谈判说服项目赞助人,三个九 ( 99.9%99.9 \% ) 的可用性就足够了。
In this sort of negotiation, the software architect must be careful to not be too egotistical and forceful in their analysis, but also make sure they are not missing anything that might prove them wrong during the negotiation. There are several key negotiation techniques an architect can use to help with this sort of stakeholder negotiation. 在这种谈判中,软件架构师必须小心不要过于自负和强势地进行分析,同时也要确保在谈判过程中没有遗漏任何可能证明他们错误的内容。架构师可以使用几种关键的谈判技巧来帮助进行这种利益相关者的谈判。
Leverage the use of grammar and buzzwords to better understand the situation. 利用语法和流行词汇来更好地理解情况。
Phrases such as “we must have zero downtime” and “I needed those features yesterday” are generally meaningless but nevertheless provide valuable information to the architect about to enter into a negotiation. For example, when the project sponsor is asked when a particular feature is needed and responds, “I needed it yesterday,” that is an indication to the software architect that time to market is important to that stakeholder. Similarly, the phrase “this system must be lightning fast” means performance is a big concern. The phase “zero downtime” means that availability is critical in the application. An effective software architect will leverage this sort of nonsense grammar to better understand the real concerns and consequently leverage that use of grammar during a negotiation. 诸如“我们必须实现零停机时间”和“我昨天就需要那些功能”这样的短语通常是没有意义的,但仍然为即将进入谈判的架构师提供了有价值的信息。例如,当项目赞助人被问及何时需要某个特定功能并回答“我昨天就需要它”时,这表明对软件架构师来说,市场时间对该利益相关者很重要。同样,“这个系统必须非常快速”意味着性能是一个重大关切。“零停机时间”意味着可用性在应用程序中至关重要。一位有效的软件架构师将利用这种无意义的语法来更好地理解真正的关切,从而在谈判中更好地利用这种语法。
Consider scenario 1 described previously. Here, the key project sponsor wants five nines of availability. Leveraging this technique tells the architect that availability is very important. This leads to a second negotiation technique: 考虑之前描述的场景 1。在这里,关键项目赞助人希望实现五个九的可用性。利用这一技术告诉架构师可用性非常重要。这引出了第二种谈判技巧:
Gather as much information as possible before entering into a negotiation. 在进入谈判之前,尽可能多地收集信息。
The phrase “five nines” is grammar that indicates high availability. However, what exactly is five nines of availability? Researching this ahead of time and gathering knowledge prior to the negotiation yields the information shown in Table 23-1. 短语“five nines”是指高可用性的术语。然而,五个九的可用性到底是什么?提前进行研究并在谈判之前收集知识可以获得表 23-1 中所示的信息。
Table 23-1. Nines of availability 表 23-1. 可用性的九个等级
Percentage uptime 百分比正常运行时间
每年停机时间(每天)
Downtime per year (per
day)
Downtime per year (per
day)| Downtime per year (per |
| :--- |
| day) |
90.0%90.0 \% (one nine)
36 days 12hrs(2.4hrs)12 \mathrm{hrs}(2.4 \mathrm{hrs}) 36 天 12hrs(2.4hrs)12 \mathrm{hrs}(2.4 \mathrm{hrs})
“Five nines” of availability is 5 minutes and 35 seconds of downtime per year, or 1 second a day of unplanned downtime. Quite ambitious, but also quite costly and unnecessary for the prior example. Putting things in hours and minutes (or in this case, seconds) is a much better way to have the conversation than sticking with the nines vernacular. “九个五”的可用性是每年 5 分钟 35 秒的停机时间,或每天 1 秒的计划外停机时间。这相当雄心勃勃,但对于之前的例子来说也是相当昂贵和不必要的。用小时和分钟(或在这种情况下,秒)来表达事情比坚持使用“九个五”的说法要好得多。
Negotiating scenario 1 would include validating the stakeholder’s concerns (“I understand that availability is very important for this system”) and then bringing the negotiation from the nines vernacular to one of reasonable hours and minutes of unplanned downtime. Three nines (which the architect deemed good enough) averages 86 seconds of unplanned downtime per day-certainly a reasonable number given the context of the global trading system described in the scenario. The architect can always resort to this tip: 谈判场景 1 将包括验证利益相关者的担忧(“我理解可用性对这个系统非常重要”),然后将谈判从九个的行话转变为合理的小时和分钟的计划外停机时间。三个九(建筑师认为足够好)平均每天计划外停机 86 秒——考虑到场景中描述的全球交易系统,这无疑是一个合理的数字。建筑师总是可以借助这个技巧:
When all else fails, state things in terms of cost and time. 当其他方法都失败时,用成本和时间来表述事情。
We recommend saving this negotiation tactic for last. We’ve seen too many negotiations start off on the wrong foot due to opening statements such as, “That’s going to cost a lot of money” or “We don’t have time for that.” Money and time (effort 我们建议将这种谈判策略留到最后。我们见过太多谈判因开场白如“这会花很多钱”或“我们没有时间”而开始得不顺利。金钱和时间(精力)
involved) are certainly key factors in any negotiation but should be used as a last resort so that other justifications and rationalizations that matter more be tried first. Once an agreement is reached, then cost and time can be considered if they are important attributes to the negotiation. 参与的因素无疑是任何谈判中的关键因素,但应作为最后的手段,以便首先尝试其他更重要的理由和合理化。一旦达成协议,如果成本和时间是谈判中重要的属性,则可以考虑这些因素。
Another important negotiation technique to always remember is the following, particularly in situations as described in scenario 1: 另一个重要的谈判技巧是始终记住以下内容,特别是在场景 1 中描述的情况:
Leverage the “divide and conquer” rule to qualify demands or requirements. 利用“分而治之”原则来确定需求或要求。
The ancient Chinese warrior Sun Tzu wrote in The Art of War, “If his forces are united, separate them.” This same divide-and-conquer tactic can be applied by an architect during negotiations as well. Consider scenario 1 previously described. In this case, the project sponsor is insisting on five nines (99.999%) of availability for the new trading system. However, does the entire system need five nines of availability? Qualifying the requirement to the specific area of the system actually requiring five nines of availability reduces the scope of difficult (and costly) requirements and the scope of the negotiation as well. 古代中国战士孙子在《孙子兵法》中写道:“如果他的军队团结在一起,就将他们分开。”这种分而治之的策略同样可以在架构师的谈判中应用。考虑之前描述的场景 1。在这种情况下,项目赞助人坚持要求新交易系统的可用性达到五个九(99.999%)。然而,整个系统真的需要五个九的可用性吗?将这一要求限定在系统中实际需要五个九可用性的特定区域,可以减少困难(和昂贵)要求的范围以及谈判的范围。
Negotiating with Other Architects 与其他架构师的谈判
Consider the following actual scenario (scenario 2) between a lead architect and another architect on the same project: 考虑以下实际场景(场景 2),这是一个首席架构师与同一项目的另一位架构师之间的对话:
Scenario 2 场景 2
The lead architect on a project believes that asynchronous messaging would be the right approach for communication between a group of services to increase both performance and scalability. However, the other architect on the project once again strongly disagrees and insists that REST would be a better choice, because REST is always faster than messaging and can scale just as well (“see for yourself by Googling it!”). This is not the first heated debate between the two architects, nor will it be the last. The lead architect must convince the other architect that messaging is the right solution. 项目的首席架构师认为,异步消息传递将是服务组之间通信的正确方法,以提高性能和可扩展性。然而,项目中的另一位架构师再次强烈反对,并坚持认为 REST 会是更好的选择,因为 REST 总是比消息传递更快,并且同样可以扩展(“自己去谷歌一下!”)。这并不是两位架构师之间第一次激烈的争论,也不会是最后一次。首席架构师必须说服另一位架构师,消息传递是正确的解决方案。
In this scenario, the lead architect can certainly tell the other architect that their opinion doesn’t matter and ignore it based on the lead architect’s seniority on the project. However, this will only lead to further animosity between the two architects and create an unhealthy and noncollaborative relationship, and consequently will end up having a negative impact on the development team. The following technique will help in these types of situations: 在这种情况下,首席架构师当然可以告诉其他架构师他们的意见不重要,并根据首席架构师在项目中的资历忽略它。然而,这只会导致两位架构师之间进一步的敌意,并造成一种不健康和不合作的关系,从而最终对开发团队产生负面影响。以下技术将有助于应对这些类型的情况:
Always remember that demonstration defeats discussion. 永远记住,演示胜于讨论。
Rather than arguing with another architect over the use of REST versus messaging, the lead architect should demonstrate to the other architect how messaging would be a better choice in their specific environment. Every environment is different, which is why simply Googling it will never yield the correct answer. By running a comparison between the two options in a production-like environment and showing the other architect the results, the argument would likely be avoided. 与另一位架构师就使用 REST 与消息传递进行争论,不如首席架构师向另一位架构师展示在他们特定环境中消息传递将是更好的选择。每个环境都是不同的,这就是为什么简单地搜索它永远不会得到正确答案的原因。通过在类似生产的环境中对这两种选项进行比较,并向另一位架构师展示结果,争论很可能会避免。
Another key negotiation technique that works in these situations is as follows: 在这些情况下,另一个有效的谈判技巧如下:
Avoid being too argumentative or letting things get too personal in a negotiation-calm leadership combined with clear and concise reasoning will always win a negotiation. 在谈判中避免过于争论或让事情变得过于个人化——冷静的领导力结合清晰简洁的推理将始终赢得谈判。
This technique is a very powerful tool when dealing with adversarial relationships like the one described in scenario 2. Once things get too personal or argumentative, the best thing to do is stop the negotiation and reengage at a later time when both parties have calmed down. Arguments will happen between architects; however, approaching these situations with calm leadership will usually force the other person to back down when things get too heated. 这种技术在处理像场景 2 中描述的对抗关系时是一种非常强大的工具。一旦事情变得过于个人化或争论,最好的做法是停止谈判,并在双方都冷静下来后再重新接触。建筑师之间会发生争论;然而,以冷静的领导方式处理这些情况通常会迫使对方在事情变得过于激烈时退让。
Negotiating with Developers 与开发人员谈判
Effective software architects don’t leverage their title as architect to tell developers what to do. Rather, they work with development teams to gain respect so that when a request is made of the development team, it doesn’t end up in an argument or resentment. Working with development teams can be difficult at times. In many cases development teams feel disconnected from the architecture (and also the architect), and as a result feel left out of the loop with regard to decisions the architect makes. This is a classic example of the Ivory Tower architecture anti-pattern. Ivory tower architects are ones who simply dictate from on high, telling development teams what to do without regard to their opinion or concerns. This usually leads to a loss of respect for the architect and an eventual breakdown of the team dynamics. One negotiation technique that can help address this situation is to always provide a justification: 有效的软件架构师不会利用他们的架构师头衔来告诉开发人员该做什么。相反,他们与开发团队合作,以获得尊重,这样当对开发团队提出请求时,就不会陷入争论或怨恨。与开发团队合作有时可能会很困难。在许多情况下,开发团队感到与架构(以及架构师)脱节,因此在架构师做出的决策方面感到被排除在外。这是典型的“象牙塔”架构反模式的例子。象牙塔架构师是那些高高在上地发号施令的人,他们告诉开发团队该做什么,而不考虑他们的意见或关切。这通常会导致对架构师的尊重丧失,并最终导致团队动态的崩溃。一种可以帮助解决这种情况的谈判技巧是始终提供理由:
When convincing developers to adopt an architecture decision or to do a specific task, provide a justification rather than “dictating from on high.” 在说服开发人员采纳架构决策或执行特定任务时,提供理由,而不是“高高在上地命令”。
By providing a reason why something needs to be done, developers will more likely agree with the request. For example, consider the following conversation between an architect and a developer with regard to making a simple query within a traditional n-tiered layered architecture: 通过提供一个理由说明为什么需要做某件事,开发人员更可能同意这个请求。例如,考虑以下关于在传统的 n 层分层架构中进行简单查询的架构师与开发人员之间的对话:
Architect: “You must go through the business layer to make that call.” 架构师:“你必须通过业务层来进行那个调用。”
Developer: “No. It’s much faster just to call the database directly.” 开发者:“不。直接调用数据库要快得多。”
There are several things wrong with this conversation. First, notice the use of the words “you must.” This type of commanding voice is not only demeaning, but is one of the worst ways to begin a negotiation or conversation. Also notice that the developer responded to the architect’s demand with a reason to counter the demand (going through the business layer will be slower and take more time). Now consider an alternative approach to this demand: 这次对话中有几个问题。首先,注意“你必须”这个词的使用。这种命令式的语气不仅令人感到贬低,而且是开始谈判或对话的最糟糕方式之一。还要注意,开发者用一个理由来反驳架构师的要求(通过业务层会更慢并且需要更多时间)。现在考虑一下对这个要求的另一种应对方式:
Architect: “Since change control is most important to us, we have formed a closedlayered architecture. This means all calls to the database need to come from the business layer.” "OK, I get it, but in that case, how am I going to deal with the performance 架构师:“由于变更控制对我们来说最重要,我们形成了一个封闭的分层架构。这意味着所有对数据库的调用都需要来自业务层。” “好的,我明白了,但在这种情况下,我该如何处理性能问题?”
Developer: “OK issues for simple queries?” 开发者:“好的,简单查询的问题?”
Notice here the architect is providing the justification for the demand that all requests need to go through the business layer of the application. Providing the justification or reason first is always a good approach. Most of the time, once a person hears something they disagree with, they stop listening. By stating the reason first, the architect is sure that the justification will be heard. Also notice the architect removed the personal nature of this demand. By not saying “you must” or “you need to,” the architect effectively turned the demand into a simple statement of fact (“this means…”). Now take a look at the developer’s response. Notice the conversation shifted from disagreeing with the layered architecture restrictions to a question about increasing performance for simple calls. Now the architect and developer can engage in a collaborative conversation to find ways to make simple queries faster while still preserving the closed layers in the architecture. 请注意,架构师在提供所有请求需要通过应用程序的业务层的要求的理由。首先提供理由总是一个好的方法。大多数时候,一旦一个人听到他们不同意的事情,他们就会停止倾听。通过首先陈述理由,架构师确保理由会被听到。还要注意,架构师去除了这一要求的个人性质。通过不说“你必须”或“你需要”,架构师有效地将要求转变为一个简单的事实陈述(“这意味着……”)。现在看看开发者的回应。注意,谈话从对分层架构限制的不同意见转向了关于提高简单调用性能的问题。现在,架构师和开发者可以进行合作对话,寻找在保持架构中封闭层的同时加快简单查询的方法。
Another effective negotiation tactic when negotiating with a developer or trying to convince them to accept a particular design or architecture decision they disagree with is to have the developer arrive at the solution on their own. This creates a winwin situation where the architect cannot lose. For example, suppose an architect is choosing between two frameworks, Framework X and Framework Y. The architect sees that Framework Y doesn’t satisfy the security requirements for the system and so naturally chooses Framework X. A developer on the team strongly disagrees and 另一个与开发人员谈判或试图说服他们接受他们不同意的特定设计或架构决策时有效的谈判策略是让开发人员自己得出解决方案。这创造了一个双赢的局面,架构师不会失败。例如,假设一个架构师在选择两个框架,框架 X 和框架 Y。架构师看到框架 Y 不满足系统的安全要求,因此自然选择框架 X。团队中的一名开发人员强烈不同意并
insists that Framework Y would still be the better choice. Rather than argue the matter, the architect tells the developer that the team will use Framework Y if the developer can show how to address the security requirements if Framework Y is used. One of two things will happen: 坚持认为框架 Y 仍然是更好的选择。架构师告诉开发者,如果开发者能展示如何满足使用框架 Y 时的安全要求,团队将使用框架 Y,而不是争论这个问题。将会发生两件事情:
The developer will fail trying to demonstrate that Framework Y will satisfy the security requirements and will understand firsthand that the framework cannot be used. By having the developer arrive at the solution on their own, the architect automatically gets buy-in and agreement for the decision to use Framework X by essentially making it the developer’s decision. This is a win. 开发者在尝试证明框架 Y 能满足安全要求时将会失败,并将亲身理解该框架无法使用。通过让开发者自己得出解决方案,架构师自动获得了对使用框架 X 的决策的支持和一致意见,实际上使其成为开发者的决定。这是一个胜利。
The developer finds a way to address the security requirements with Framework Y and demonstrates this to the architect. This is a win as well. In this case the architect missed something in Framework Y, and it also ended up being a better framework over the other one. 开发者找到了一种方法来满足安全要求,并向架构师演示了这一点。这也是一个胜利。在这种情况下,架构师在 Framework Y 中遗漏了一些内容,并且它最终成为了比另一个框架更好的框架。
If a developer disagrees with a decision, have them arrive at the solution on their own. 如果开发人员不同意某个决定,让他们自己找到解决方案。
It’s really through collaboration with the development team that the architect is able to gain the respect of the team and form better solutions. The more developers respect an architect, the easier it will be for the architect to negotiate with those developers. 通过与开发团队的合作,架构师才能获得团队的尊重并形成更好的解决方案。开发人员越尊重架构师,架构师与这些开发人员的谈判就越容易。
The Software Architect as a Leader 软件架构师作为领导者
A software architect is also a leader, one who can guide a development team through the implementation of the architecture. We maintain that about 50%50 \% of being an effective software architect is having good people skills, facilitation skills, and leadership skills. In this section we discuss several key leadership techniques that an effective software architect can leverage to lead development teams. 软件架构师也是一位领导者,能够引导开发团队实施架构。我们认为,成为一名有效的软件架构师大约有 50%50 \% 的关键在于具备良好的人际交往能力、促进能力和领导能力。在本节中,我们讨论几种有效的软件架构师可以利用的关键领导技巧,以引导开发团队。
The 4 C's of Architecture 架构的 4 个 C
Each day things seem to be getting more and more complex, whether it be increased complexity in business processes or increased complexity of systems and even architecture. Complexity exists within architecture as well as software development, and always will. Some architectures are very complex, such as ones supporting six nines of availability ( 99.9999%99.9999 \% )-that’s equivalent to unplanned downtime of about 86 每天事情似乎变得越来越复杂,无论是业务流程的复杂性增加,还是系统甚至架构的复杂性增加。复杂性存在于架构以及软件开发中,并且将永远存在。一些架构非常复杂,例如支持六个九的可用性( 99.9999%99.9999 \% )——这相当于大约 86 小时的计划外停机时间。
milliseconds a day, or 31.5 seconds of downtime per year. This sort of complexity is known as essential complexity-in other words, “we have a hard problem.” 每天毫秒,或每年 31.5 秒的停机时间。这种复杂性被称为基本复杂性——换句话说,“我们面临一个棘手的问题。”
One of the traps many architects fall into is adding unnecessary complexity to solutions, diagrams, and documentation. Architects (as well as developers) seem to love complexity. To quote Neal: 许多架构师陷入的一个陷阱是给解决方案、图表和文档增加不必要的复杂性。架构师(以及开发人员)似乎都喜欢复杂性。引用尼尔的话:
Developers are drawn to complexity like moths to a flame-frequently with the same result. 开发者被复杂性吸引,就像飞蛾扑火一样——结果往往是相同的。
Consider the diagram in Figure 23-1 illustrating the major information flows for the backend processing systems at a very large global bank. Is this necessarily complex? No one knows the answer to this question because the architect has made it complex. This sort of complexity is called accidental complexity-in other words, “we have made a problem hard.” Architects sometimes do this to prove their worth when things seem too simple or to guarantee that they are always kept in the loop on discussions and decisions that are made regarding the business or architecture. Other architects do this to maintain job security. Whatever the reason, introducing accidental complexity into something that is not complex is one of the best ways to become an ineffective leader as an architect. 考虑图 23-1 中展示的一个非常大型全球银行后端处理系统的主要信息流。这个系统一定复杂吗?没有人知道这个问题的答案,因为架构师使它变得复杂。这种复杂性被称为意外复杂性——换句话说,“我们让一个问题变得困难。”架构师有时这样做是为了证明自己的价值,当事情看起来过于简单时,或者为了确保他们始终参与与业务或架构相关的讨论和决策。其他架构师这样做是为了维持工作安全。无论原因是什么,将意外复杂性引入本不复杂的事物中,是作为架构师成为无效领导者的最佳方式之一。
Figure 23-1. Introducing accidental complexity into a problem 图 23-1. 将意外复杂性引入问题
An effective way of avoiding accidental complexity is what we call the 4 C’s of architecture: communication, collaboration, clarity, and conciseness. These factors (illustrated in Figure 23-2) all work together to create an effective communicator and collaborator on the team. 避免意外复杂性的有效方法是我们所称的架构的 4 C:沟通、协作、清晰和简洁。这些因素(如图 23-2 所示)共同作用,创造出团队中有效的沟通者和合作者。
Figure 23-2. The 4 C’s of architecture 图 23-2. 架构的 4 个 C
As a leader, facilitator, and negotiator, is it vital that a software architect be able to effectively communicate in a clear and concise manner. It is equally important that an architect also be able to collaborate with developers, business stakeholders, and other architects to discuss and form solutions together. Focusing on the 4 C’s of architecture helps an architect gain the respect of the team and become the go-to person on the project that everyone comes to not only for questions, but also for advice, mentoring, coaching, and leadership. 作为领导者、促进者和谈判者,软件架构师能够以清晰简洁的方式有效沟通是至关重要的。同样重要的是,架构师还能够与开发人员、业务利益相关者和其他架构师合作,共同讨论和形成解决方案。关注架构的 4 C 有助于架构师赢得团队的尊重,成为项目中大家不仅咨询问题,还寻求建议、指导、辅导和领导的首选人。
Be Pragmatic, Yet Visionary 务实,但要有远见
An effective software architect must be pragmatic, yet visionary. Doing this is not as easy as it sounds and takes a fairly high level of maturity and significant practice to accomplish. To better understand this statement, consider the definition of a visionary: 一个有效的软件架构师必须务实而富有远见。做到这一点并不像听起来那么简单,需要相当高的成熟度和大量的实践才能实现。为了更好地理解这一说法,考虑一下“远见者”的定义:
Visionary 愿景者
Thinking about or planning the future with imagination or wisdom. 用想象力或智慧思考或规划未来。
Being a visionary means applying strategic thinking to a problem, which is exactly what an architect is supposed to do. Architecture is about planning for the future and making sure the architectural vitality (how valid an architecture is) remains that way for a long time. However, too many times, architects become too theoretical in their planning and designs, creating solutions that become too difficult to understand or even implement. Now consider the definition of being pragmatic: 作为一个有远见的人意味着将战略思维应用于问题,这正是架构师应该做的。架构是关于为未来进行规划,并确保架构的活力(架构的有效性)在很长一段时间内保持这种状态。然而,太多时候,架构师在他们的规划和设计中变得过于理论化,创造出难以理解甚至实施的解决方案。现在考虑务实的定义:
Pragmatic 务实
Dealing with things sensibly and realistically in a way that is based on practical rather than theoretical considerations. 以务实而非理论的考虑方式,理智和现实地处理事务。
While architects need to be visionaries, they also need to apply practical and realistic solutions. Being pragmatic is taking all of the following factors and constraints into account when creating an architectural solution: 虽然架构师需要具备远见,但他们也需要应用实用和现实的解决方案。务实是指在创建架构解决方案时考虑以下所有因素和限制:
Budget constraints and other cost-based factors 预算限制和其他基于成本的因素
Time constraints and other time-based factors 时间限制和其他基于时间的因素
Skill set and skill level of the development team 开发团队的技能组合和技能水平
Trade-offs and implications associated with an architecture decision 与架构决策相关的权衡和影响
Technical limitations of a proposed architectural design or solution 提议的架构设计或解决方案的技术限制
A good software architect is one that strives to find an appropriate balance between being pragmatic while still applying imagination and wisdom to solving problems (see Figure 23-3). For example, consider the situation where an architect is faced with a difficult problem dealing with elasticity (unknown sudden and significant increases in concurrent user load). A visionary might come up with an elaborate way to deal with this through the use of a complex data mesh, which is a collection of distributed, domain-based databases. In theory this might be a good approach, but being pragmatic means applying reason and practicality to the solution. For example, has the company ever used a data mesh before? What are the trade-offs of using a data mesh? Would this really solve the problem? 一个好的软件架构师是努力在务实与运用想象力和智慧解决问题之间找到适当平衡的人(见图 23-3)。例如,考虑一个架构师面临一个与弹性相关的困难问题(未知的突然和显著增加的并发用户负载)。一个有远见的人可能会通过使用复杂的数据网格来提出一个复杂的解决方案,数据网格是一个分布式的、基于领域的数据库集合。从理论上讲,这可能是一个好的方法,但务实意味着将理性和实用性应用于解决方案。例如,公司以前是否使用过数据网格?使用数据网格的权衡是什么?这真的能解决问题吗?
Figure 23-3. Good architects find the balance between being pragmatic, yet visionary 图 23-3. 优秀的架构师在务实与富有远见之间找到平衡
Maintaining a good balance between being pragmatic, yet visionary, is an excellent way of gaining respect as an architect. Business stakeholders will appreciate visionary solutions that fit within a set of constraints, and developers will appreciate having a practical (rather then theoretical) solution to implement. 在务实与富有远见之间保持良好的平衡,是赢得作为架构师的尊重的绝佳方式。业务利益相关者会欣赏在一系列约束条件下的远见解决方案,而开发人员则会欣赏有一个实用的(而非理论的)解决方案来实施。
A pragmatic architect would first look at what the limiting factor is when needing high levels of elasticity. Is it the database that’s the bottleneck? Maybe it’s a bottleneck with respect to some of the services invoked or other external sources needed. Finding and isolating the bottleneck would be a first practical approach to the problem. In 一个务实的架构师首先会查看在需要高水平弹性时限制因素是什么。是数据库成为瓶颈吗?也许是某些被调用的服务或其他所需外部源的瓶颈。找到并隔离瓶颈将是解决问题的第一步实际方法。
fact, even if it is the database, could some of the data needed be cached so that the database need not be accessed at all? 事实上,即使是数据库,是否可以缓存一些所需的数据,以便根本不需要访问数据库?
Leading Teams by Example 以身作则领导团队
Bad software architects leverage their title to get people to do what they want them to do. Effective software architects get people to do things by not leveraging their title as architect, but rather by leading through example, not by title. This is all about gaining the respect of development teams, business stakeholders, and other people throughout the organization (such as the head of operations, development managers, and product owners). 糟糕的软件架构师利用他们的头衔让人们做他们想要的事情。有效的软件架构师通过不利用他们作为架构师的头衔来让人们做事情,而是通过以身作则来领导,而不是依靠头衔。这一切都是为了赢得开发团队、业务利益相关者以及组织内其他人员(如运营主管、开发经理和产品负责人)的尊重。
The classic “lead by example, not by title” story involves a captain and a sergeant during a military battle. The high-ranking captain, who is largely removed from the troops, commands all of the troops to move forward during the battle to take a particularly difficult hill. However, rather than listen to the high-ranking captain, the soldiers, full of doubt, look over to the lower-ranking sergeant for whether they should take the hill or not. The sergeant looks at the situation, nods his head slightly, and the soldiers immediately move forward with confidence to take the hill. 经典的“以身作则,而非以头衔领导”的故事涉及一位军官和一位军士在军事战斗中的情景。这位高阶军官与部队关系疏远,在战斗中命令所有部队向前推进,攻占一座特别困难的山。然而,士兵们并没有听从这位高阶军官的命令,反而充满疑虑地看向低阶军士,询问他们是否应该攻占这座山。军士观察了情况,微微点头,士兵们立刻充满信心地向前推进,攻占了这座山。
The moral of this story is that rank and title mean very little when it comes to leading people. The computer scientist Gerald Weinberg is famous for saying, “No matter what the problem is, it’s a people problem.” Most people think that solving technical issues has nothing to do with people skills-it has to do with technical knowledge. While having technical knowledge is certainly necessary for solving a problem, it’s only a part of the overall equation for solving any problem. Suppose, for example, an architect is holding a meeting with a team of developers to solve an issue that’s come up in production. One of the developers makes a suggestion, and the architect responds with, “Well, that’s a dumb idea.” Not only will that developer not make any more suggestions, but none of the other developers will dare say anything. The architect in this case has effectively shut down the entire team from collaborating on the solution. 这个故事的寓意是,职级和头衔在领导人时几乎没有意义。计算机科学家杰拉尔德·温伯格以“无论问题是什么,这都是一个人际问题”而闻名。大多数人认为解决技术问题与人际交往能力无关——这与技术知识有关。虽然拥有技术知识对于解决问题当然是必要的,但这只是解决任何问题的整体方程的一部分。例如,假设一位架构师正在与一组开发人员召开会议,以解决在生产中出现的问题。一位开发人员提出了一个建议,架构师回应说:“嗯,这是个愚蠢的主意。”不仅这位开发人员不会再提出任何建议,其他开发人员也不敢说任何话。在这种情况下,架构师有效地阻止了整个团队在解决方案上的合作。
Gaining respect and leading teams is about basic people skills. Consider the following dialogue between an architect and a customer, client, or development team with regard to a performance issue in the application: 获得尊重和领导团队与基本的人际交往技巧有关。考虑以下架构师与客户、客户或开发团队之间关于应用程序性能问题的对话:
Developer: “So how are we going to solve this performance problem?” 开发者:“那么我们将如何解决这个性能问题?”
Architect: “What you need to do is use a cache. That would fix the problem.” 架构师:“你需要做的是使用缓存。这将解决问题。”
Developer: “Don’t tell me what to do.” 开发者:“别告诉我该做什么。”
Architect: “What I’m telling you is that it would fix the problem.” 架构师:“我告诉你的是,这将解决问题。”
By using the words “what you need to do is…” or “you must,” the architect is forcing their opinion onto the developer and essentially shutting down collaboration. This is a good example of using communication, not collaboration. Now consider the revised dialogue: 通过使用“你需要做的是……”或“你必须”的措辞,架构师正在将他们的观点强加给开发者,实际上是在关闭合作。这是一个使用沟通而非合作的好例子。现在考虑修订后的对话:
Developer: “So how are we going to solve this performance problem?” 开发者:“那么我们将如何解决这个性能问题?”
Architect: “Have you considered using a cache? That might fix the problem.” 架构师:“你考虑过使用缓存吗?那可能解决这个问题。”
Developer: “Hmmm, no we didn’t think about that. What are your thoughts?” 开发者:“嗯,我们没有考虑到这一点。你有什么想法?”
Architect: “Well, if we put a cache here…” 架构师:“好吧,如果我们在这里放一个缓存……”
Notice the use of the words “have you considered…” or “what about…” in the dialogue. By asking the question, it puts control back on the developer or client, creating a collaborative conversation where both the architect and developer are working together to form a solution. The use of grammar is vitally important when trying to build a collaborative environment. Being a leader as an architect is not only being able to collaborate with others to create an architecture, but also to help promote collaboration among the team by acting as a facilitator. As an architect, try to observe team dynamics and notice when situations like the first dialogue occurs. By taking team members aside and coaching them on the use of grammar as a means of collaboration, not only will this create better team dynamics, but it will also help create respect among the team members. 注意对话中使用“你考虑过……吗?”或“那……怎么样?”这样的词语。通过提问,将控制权重新交回给开发者或客户,创造了一个协作的对话环境,在这个环境中,架构师和开发者共同努力形成解决方案。在尝试建立协作环境时,语法的使用至关重要。作为架构师,领导不仅仅是能够与他人合作创建架构,还要通过充当促进者来帮助促进团队之间的协作。作为架构师,尽量观察团队动态,注意何时出现类似于第一次对话的情况。通过将团队成员单独带到一边,并指导他们使用语法作为协作的手段,这不仅会创造更好的团队动态,还会帮助在团队成员之间建立尊重。
Another basic people skills technique that can help build respect and healthy relationships between an architect and the development team is to always try to use the person’s name during a conversation or negotiation. Not only do people like hearing their name during a conversation, it also helps breed familiarity. Practice remembering people’s names, and use them frequently. Given that names are sometimes hard to pronounce, make sure to get the pronunciation correct, then practice that pronunciation until it is perfect. Whenever we ask someone’s name, we repeat it to the person and ask if that’s the correct way to pronounce it. If it’s not correct, we repeat this process until we get it right. 另一个基本的人际交往技巧,可以帮助建立建筑师与开发团队之间的尊重和健康关系,就是在对话或谈判中始终尝试使用对方的名字。人们不仅喜欢在对话中听到自己的名字,这也有助于培养熟悉感。练习记住人们的名字,并经常使用它们。考虑到名字有时很难发音,确保正确发音,然后练习这个发音直到完美。每当我们询问某人的名字时,我们会将其重复给对方,并询问这是否是正确的发音。如果不正确,我们会重复这个过程,直到我们说对为止。
If an architect meets someone for the first time or only occasionally, always shake the person’s hand and make eye contact. A handshake is an important people skill that goes back to medieval times. The physical bond that occurs during a simple handshake lets both people know they are friends, not foes, and forms a bond between them. However, it is sometimes hard to get a simple handshake right. 如果一个建筑师第一次见到某人或只是偶尔见面,始终要握手并进行眼神交流。握手是一项重要的人际交往技巧,可以追溯到中世纪。简单握手时产生的身体联系让双方知道他们是朋友,而不是敌人,并在他们之间形成了一种联系。然而,有时很难做到一个简单的握手。
When shaking someone’s hand, give a firm (but not overpowering) handshake while looking the person in the eye. Looking away while shaking someone’s hand is a sign of disrespect, and most people will notice that. Also, don’t hold on to the handshake too long. A simple two- to three-second, firm handshake is all that is needed to start off a conversation or to greet someone. There is also the issue of going overboard with the handshake technique and making the other person uncomfortable enough to not want to communicate or collaborate with you. For example, imagine a software architect who comes in every morning and starts shaking everyone’s hand. Not only is this a little weird, it creates an uncomfortable situation. However, imagine a software architect who must meet with the head of operations monthly. This is the perfect opportunity to stand up, say “Hello Ruth, nice seeing you again,” and give a 在握手时,给一个坚定(但不过于强烈)的握手,同时注视对方的眼睛。握手时视线游离是对他人的不尊重,大多数人会注意到这一点。此外,不要握手时间过长。简单的两到三秒的坚定握手就足以开始一段对话或问候某人。还有一个问题是过度使用握手技巧,导致对方感到不适,从而不想与你沟通或合作。例如,想象一个软件架构师每天早上都来,开始握每个人的手。这不仅有点奇怪,还会造成不适的局面。然而,想象一个软件架构师每月必须与运营主管会面。这是一个完美的机会,可以站起来,说“你好,露丝,很高兴再次见到你”,并给一个
quick, firm handshake. Knowing when to do a handshake and when not to is part of the complex art of people skills. 快速而坚定的握手。知道何时握手以及何时不握手是人际交往技巧复杂艺术的一部分。
A software architect as a leader, facilitator, and negotiator should be careful to preserve the boundaries that exist between people at all levels. The handshake, as described previously, is an effective and professional technique of forming a physical bond with the person you are communicating or collaborating with. However, while a handshake is good, a hug in a professional setting, regardless of the environment, is not. An architect might think that it exemplifies more physical connection and bonding, but all it does is sometimes make the other person at work more uncomfortable and, more importantly, can lead to potential harassment issues within the workplace. Skip the hugs all together, regardless of the professional environment, and stick with the handshake instead (unless of course everyone in the company hugs each other, which would just be…weird). 作为领导者、促进者和谈判者,软件架构师应谨慎维护各级人员之间的界限。如前所述,握手是一种有效且专业的与沟通或合作对象建立身体联系的技巧。然而,尽管握手很好,但在专业环境中,无论环境如何,拥抱都是不合适的。架构师可能认为这体现了更多的身体联系和亲密感,但实际上,这只会让工作中的其他人感到更加不适,更重要的是,可能导致工作场所内的潜在骚扰问题。无论专业环境如何,都应完全避免拥抱,而应坚持握手(当然,除非公司里的每个人都互相拥抱,那就会显得……奇怪)。
Sometimes it’s best to turn a request into a favor as a way of getting someone to do something they otherwise might not want to do. In general, people do not like to be told what to do, but for the most part, people want to help others. This is basic human nature. Consider the following conversation between an architect and developer regarding an architecture refactoring effort during a busy iteration: 有时候,将请求转化为一个人情是让某人做他们可能不想做的事情的最佳方式。一般来说,人们不喜欢被告知该做什么,但大多数情况下,人们想要帮助他人。这是基本的人性。考虑以下建筑师和开发人员之间关于在繁忙迭代期间进行架构重构努力的对话:
Architect: “I’m going to need you to split the payment service into five different services, with each service containing the functionality for each type of payment we accept, such as store credit, credit card, PayPal, gift card, and reward points, to provide better fault tolerance and scalability in the website. It shouldn’t take too long.” 架构师:“我需要你将支付服务拆分成五个不同的服务,每个服务包含我们接受的每种支付方式的功能,例如商店积分、信用卡、PayPal、礼品卡和奖励积分,以提供更好的容错能力和可扩展性。应该不会花太长时间。”
Developer: “No way, man. Way too busy this iteration for that. Sorry, can’t do it.” 开发者:“没办法,伙计。这一轮忙得不可开交。抱歉,做不到。”
Architect: “Listen, this is important and needs to be done this iteration.” 架构师:“听着,这很重要,需要在这一迭代中完成。”
Developer: “Sorry, no can do. Maybe one of the other developers can do it. I’m just too busy.” 开发者:“抱歉,做不到。也许其他开发者可以做到。我实在太忙了。”
Notice the developer’s response. It is an immediate rejection of the task, even though the architect justified it through better fault tolerance and scalability. In this case, notice that the architect is telling the developer to do something they are simply too busy to do. Also notice the demand doesn’t even include the person’s name! Now consider the technique of turning the request into a favor: 注意开发者的反应。这是对任务的立即拒绝,尽管架构师通过更好的容错性和可扩展性为其辩护。在这种情况下,请注意架构师告诉开发者去做一些他们根本没有时间做的事情。还要注意,这个要求甚至没有包括这个人的名字!现在考虑将请求转变为一个请求的技巧:
Architect: “Hi, Sridhar. Listen, I’m in a real bind. I really need to have the payment service split into separate services for each payment type to get better fault tolerance and scalability, and I waited too long to do it. Is there any way you can squeeze this into this iteration? It would really help me out.” 架构师:“嗨,Sridhar。听着,我现在真的很困难。我真的需要将支付服务拆分为每种支付类型的独立服务,以获得更好的容错性和可扩展性,而我等得太久了。你能否把这个挤进这个迭代中?这对我真的很有帮助。”
Developer: “(Pause)…I’m really busy this iteration, but I guess so. I’ll see what I can do.” 开发者:“(停顿)……我这次迭代真的很忙,但我想是的。我会看看我能做些什么。”
Architect: “Thanks, Sridhar, I really appreciate the help. I owe you one.” 建筑师:“谢谢你,Sridhar,我真的很感激你的帮助。我欠你一个人情。”
Developer: “No worries, I’ll see that it gets done this iteration.” 开发者:“没问题,我会确保在这一迭代中完成。”
First, notice the use of the person’s name repeatedly throughout the conversation. Using the person’s name makes the conversation more of a personal, familiar nature 首先,请注意在对话中反复使用这个人的名字。使用这个人的名字使对话更具个人化和熟悉感。
rather than an impersonal professional demand. Second, notice the architect admits they are in a “real bind” and that splitting the services would really “help them out a lot.” This technique does not always work, but playing off of basic human nature of helping each other has a better probability of success over the first conversation. Try this technique the next time you face this sort of situation and see the results. In most cases, the results will be much more positive than telling someone what to do. 而不是一种冷漠的职业要求。其次,请注意架构师承认他们处于“真正的困境”中,拆分服务确实会“对他们帮助很大”。这种技巧并不总是有效,但利用人类基本的互助本性比第一次对话更有成功的可能性。下次你面临这种情况时尝试这种技巧,看看结果。在大多数情况下,结果会比告诉某人该怎么做要积极得多。
To lead a team and become an effective leader, a software architect should try to become the go-to person on the team-the person developers go to for their questions and problems. An effective software architect will seize the opportunity and take the initiative to lead the team, regardless of their title or role on the team. When a software architect observes someone struggling with a technical issue, they should step in and offer help or guidance. The same is true for nontechnical situations as well. Suppose an architect observes a team member that comes into work looking sort of depressed and bothered-clearly something is up. In this circumstance, an effective software architect would notice the situation and offer to talk-something like, “Hey, Antonio, I’m heading over to get some coffee. Why don’t we head over together?” and then during the walk ask if everything is OK. This at least provides an opening for more of a personal discussion; and at it’s best, a chance to mentor and coach at a more personal level. However, an effective leader will also recognize times to not be too pushy and will back off by reading various verbal signs and facial expressions. 要领导一个团队并成为一名有效的领导者,软件架构师应该努力成为团队中的首选人选——开发人员在遇到问题和疑问时会找的人。一个有效的软件架构师会抓住机会,主动领导团队,无论他们在团队中的头衔或角色是什么。当软件架构师观察到有人在技术问题上挣扎时,他们应该介入并提供帮助或指导。非技术情况也是如此。假设一位架构师观察到某个团队成员上班时看起来有些沮丧和烦恼——显然有什么事情发生。在这种情况下,一个有效的软件架构师会注意到这种情况并主动提出交谈——比如说:“嘿,Antonio,我正要去喝咖啡。我们一起去吧?”然后在走路的过程中询问一切是否还好。这至少提供了一个进行更个人化讨论的机会;在最佳情况下,提供了在更个人层面上指导和辅导的机会。然而,一个有效的领导者也会识别出不应过于强势的时机,并通过解读各种语言信号和面部表情来适时退后。
Another technique to start gaining respect as a leader and become the go-to person on the team is to host periodic brown-bag lunches to talk about a specific technique or technology. Everyone reading this book has a particular skill or knowledge that others don’t have. By hosting a periodic brown-bag lunch session, the architect not only is able to exhibit their technical prowess, but also practice speaking skills and mentoring skills. For example, host a lunch session on a review of design patterns or the latest features of the programming language release. Not only does this provide valuable information to developers, but it also starts identifying you as a leader and mentor on the team. 另一种开始获得作为领导者的尊重并成为团队中首选人物的技巧是定期举办午餐会,讨论特定的技术或技术。阅读本书的每个人都有特定的技能或知识,而其他人则没有。通过定期举办午餐会,架构师不仅能够展示他们的技术能力,还能练习演讲技巧和指导技能。例如,举办一个关于设计模式回顾或编程语言最新特性的午餐会。这不仅为开发人员提供了有价值的信息,还开始将你识别为团队中的领导者和导师。
Integrating with the Development Team 与开发团队的集成
An architect’s calendar is usually filled with meetings, with most of those meetings overlapping with other meetings, such as the calendar shown in Figure 23-4. If this is what a software architect’s calendar looks like, then when does the architect have the time to integrate with the development team, help guide and mentor them, and be available for questions or concerns when they come up? Unfortunately, meetings are a necessary evil within the information technology world. They happen frequently, and will always happen. 架构师的日历通常充满了会议,其中大多数会议与其他会议重叠,如图 23-4 所示。如果这就是软件架构师的日历,那么架构师何时有时间与开发团队进行整合,帮助指导和辅导他们,并在出现问题或疑虑时提供帮助呢?不幸的是,会议在信息技术领域是一个必要的恶性循环。它们频繁发生,并且将始终发生。
Figure 23-4. A typical calendar of a software architect 图 23-4. 软件架构师的典型日历
The key to being an effective software architect is making more time for the development team, and this means controlling meetings. There are two types of meetings an architect can be involved in: those imposed upon (the architect is invited to a meeting), and those imposed by (the architect is calling the meeting). These meeting types are illustrated in Figure 23-5. 成为一名有效的软件架构师的关键是为开发团队腾出更多时间,这意味着要控制会议。架构师可以参与两种类型的会议:一种是被迫参加的(架构师被邀请参加会议),另一种是主动召开的(架构师召集会议)。这两种会议类型在图 23-5 中进行了说明。
Figure 23-5. Meeting types 图 23-5. 会议类型
Imposed upon meetings are the hardest to control. Due to the number of stakeholders a software architect must communicate and collaborate with, architects are invited to almost every meeting that gets scheduled. When invited to a meeting, an effective software architect should always ask the meeting organizer why they are needed in that meeting. Many times architects get invited to meetings simply to keep them in the loop on the information being discussed. That’s what meeting notes are for. By asking why, an architect can start to qualify which meetings they should attend and which ones they can skip. Another related technique to help reduce the number of meetings an architect is involved in is to ask for the meeting agenda before accepting a meeting invite. The meeting organizer may feel that the architect is necessary, but by looking at the agenda, the architect can qualify whether they really need to be in the meeting or not. Also, many times it is not necessary to attend the entire meeting. By reviewing the agenda, an architect can optimize their time by either showing up when relevant information is being discussed or leaving after the relevant discussion is over. Don’t waste time in a meeting if you can be spending that time working with the development team. 被强加的会议是最难控制的。由于软件架构师必须与众多利益相关者沟通和协作,架构师几乎被邀请参加每一个安排的会议。当被邀请参加会议时,一个有效的软件架构师应该始终询问会议组织者他们为什么需要参加该会议。很多时候,架构师被邀请参加会议只是为了让他们了解正在讨论的信息。这就是会议记录的作用。通过询问原因,架构师可以开始判断他们应该参加哪些会议,以及哪些会议可以跳过。另一个相关的技巧是,在接受会议邀请之前,要求会议议程,以帮助减少架构师参与的会议数量。会议组织者可能认为架构师是必要的,但通过查看议程,架构师可以判断他们是否真的需要参加会议。此外,很多时候并不需要参加整个会议。通过审查议程,架构师可以优化他们的时间,要么在相关信息讨论时出现,要么在相关讨论结束后离开。 如果你可以花时间与开发团队合作,就不要在会议上浪费时间。
Ask for the meeting agenda ahead of time to help qualify if you are really needed at the meeting or not. 提前询问会议议程,以帮助判断您是否真的需要参加会议。
Another effective technique to keep a development team on track and to gain their respect is to take one for the team when developers are invited to a meeting as well. Rather than having the tech lead attend the meeting, go in their place, particularly if both the tech lead and architect are invited to a meeting. This keeps a development team focused on the task at hand rather than continually attending meetings as well. While deflecting meetings away from useful team members increases the time an architect is in meetings, it does increase the development team’s productivity. 另一个有效的技术是让开发团队保持在正轨上并赢得他们的尊重,那就是在开发人员被邀请参加会议时,替他们出席。特别是当技术负责人和架构师都被邀请参加会议时,不要让技术负责人出席,而是由架构师代替。这可以让开发团队专注于手头的任务,而不是不断参加会议。虽然将会议从有用的团队成员身上转移开会增加架构师在会议中的时间,但确实提高了开发团队的生产力。
Meetings that an architect imposes upon others (the architect calls the meeting) are also a necessity at times but should be kept to an absolute minimum. These are the kinds of meetings an architect has control over. An effective software architect will always ask whether the meeting they are calling is more important than the work they are pulling their team members away from. Many times an email is all that is required to communicate some important information, which saves everyone tons of wasted time. When calling a meeting as an architect, always set an agenda and stick to it. Too often, meetings an architect calls get derailed due to some other issue, and that other issue may not be relevant to everyone else in the meeting. Also, as an architect, pay close attention to developer flow and be sure not to disrupt it by calling a meeting. Flow is a state of mind developers frequently get into where the brain gets 100%100 \% engaged in a particular problem, allowing full attention and maximum creativity. For 建筑师强加给他人的会议(建筑师召集会议)有时也是必要的,但应尽量减少到绝对最低限度。这些是建筑师可以控制的会议。一个有效的软件架构师总是会问他们召集的会议是否比他们让团队成员离开的工作更重要。很多时候,一封电子邮件就足以传达一些重要信息,这样可以节省大家大量的时间。当作为建筑师召集会议时,始终设定议程并遵循它。建筑师召集的会议往往因为其他问题而偏离主题,而那个其他问题可能与会议上的其他人无关。此外,作为建筑师,要密切关注开发者的工作流,确保不会因为召集会议而打断它。工作流是开发者经常进入的一种心态,在这种状态下,大脑会全神贯注于特定问题,从而允许充分的注意力和最大创造力。
example, a developer might be working on a particularly difficult algorithm or piece of code, and literally hours go by while it seems only minutes have passed. When calling a meeting, an architect should always try to schedule meetings either first thing in the morning, right after lunch, or toward the end of the day, but not during the day when most developers experience flow state. 例如,开发人员可能正在处理一个特别困难的算法或代码,实际上几个小时过去了,而似乎只过了几分钟。在召集会议时,架构师应始终尝试在早上一开始、午餐后立即或一天结束时安排会议,而不是在大多数开发人员经历流状态的白天。
Aside from managing meetings, another thing an effective software architect can do to integrate better with the development team is to sit with that team. Sitting in a cubicle away from the team sends the message that the architect is special, and those physical walls surrounding the cubicle are a distinct message that the architect is not to be bothered or disturbed. Sitting alongside a development team sends the message that the architect is an integral part of the team and is available for questions or concerns as they arise. By physically showing that they are part of the development team, the architect gains more respect and is better able to help guide and mentor the team. 除了管理会议,有效的软件架构师可以做的另一件事是与开发团队坐在一起。坐在远离团队的隔间里会传达出架构师是特别的,而围绕隔间的物理墙壁则明确传达出架构师不应被打扰或干扰。与开发团队并肩坐在一起则传达出架构师是团队不可或缺的一部分,并且在出现问题或疑虑时可以随时提供帮助。通过在物理上表明他们是开发团队的一部分,架构师获得了更多的尊重,并更好地能够指导和辅导团队。
Sometimes it is not possible for an architect to sit with a development team. In these cases the best thing an architect can do is continually walk around and be seen. Architects that are stuck on a different floor or always in their offices or cubicles and never seen cannot possibly help guide the development team through the implementation of the architecture. Block off time in the morning, after lunch, or late in the day and make the time to converse with the development team, help with issues, answer questions, and do basic coaching and mentoring. Development teams appreciate this type of communication and will respect you for making time for them during the day. The same holds true for other stakeholders. Stopping in to say hi to the head of operations while on the way to get more coffee is an excellent way of keeping communication open and available with business and other key stakeholders. 有时候,架构师无法与开发团队坐在一起。在这种情况下,架构师能做的最好的事情就是不断走动并被看到。那些被困在不同楼层或总是在办公室或隔间里、从未被看到的架构师,无法帮助指导开发团队实施架构。可以在早上、午餐后或一天结束时预留时间,与开发团队交谈,帮助解决问题,回答问题,并进行基本的辅导和指导。开发团队会欣赏这种沟通方式,并会尊重你在白天为他们腾出时间。其他利益相关者也是如此。在去拿更多咖啡的路上顺便问候运营主管,是保持与业务和其他关键利益相关者沟通开放和可用的绝佳方式。
Summary 摘要
The negotiation and leadership tips presented and discussed in this chapter are meant to help the software architect form a better collaborative relationship with the development team and other stakeholders. These are necessary skills an architect must have in order to become an effective software architect. While the tips we presented in this chapter are good tips for starting the journey into becoming more of an effective leader, perhaps the best tip of all is from a quote from Theodore Roosevelt, the 26th US president: 本章中提出和讨论的谈判和领导技巧旨在帮助软件架构师与开发团队和其他利益相关者建立更好的协作关系。这些是架构师必须具备的必要技能,以成为有效的软件架构师。虽然我们在本章中提出的建议是开始成为更有效领导者旅程的好建议,但也许最好的建议来自美国第 26 任总统西奥多·罗斯福的一句名言:
The most important single ingredient in the formula of success is knowing how to get along with people. 成功公式中最重要的单一成分是知道如何与人相处。
-Theodore Roosevelt -西奥多·罗斯福
CHAPTER 24 第 24 章
Developing a Career Path 发展职业路径
Becoming an architect takes time and effort, but based on the many reasons we’ve outlined throughout this book, managing a career path after becoming an architect is equally tricky. While we can’t chart a specific career path for you, we can point you to some practices that we have seen work well. 成为一名架构师需要时间和努力,但根据我们在本书中概述的许多原因,成为架构师后管理职业发展同样棘手。虽然我们无法为您规划一条具体的职业道路,但我们可以指出一些我们看到的有效实践。
An architect must continue to learn throughout their career. The technology world changes at a dizzying pace. One of Neal’s former coworkers was a world-renowned expert in Clipper. He lamented that he couldn’t take the enormous body of (now useless) Clipper knowledge and replace it with something else. He also speculated (and this is still an open question): has any group in history learned and thrown away so much detailed knowledge within their lifetimes as software developers? 建筑师必须在整个职业生涯中不断学习。技术世界以令人眼花缭乱的速度变化。尼尔的一位前同事是 Clipper 领域的世界知名专家。他感叹自己无法将庞大的(现在无用的)Clipper 知识替换为其他东西。他还推测(这仍然是一个悬而未决的问题):历史上有没有哪个群体在其一生中学习并抛弃如此多的详细知识,像软件开发者一样?
Each architect should keep an eye out for relevant resources, both technology and business, and add them to their personal stockpile. Unfortunately, resources come and go all too quickly, which is why we don’t list any in this book. Talking to colleagues or experts about what resources they use to keep current is one good way of seeking out the latest newsfeeds, websites, and groups that are active in a particular area of interest. Architects should also build into their day some time to maintain breadth utilizing those resources. 每位架构师都应该关注相关的资源,包括技术和商业,并将其添加到个人的储备中。不幸的是,资源来得快去得也快,这就是为什么我们在这本书中没有列出任何资源。与同事或专家交谈,了解他们使用哪些资源来保持最新,是寻找特定兴趣领域中活跃的最新新闻源、网站和团体的一个好方法。架构师还应该在日常工作中安排一些时间,以利用这些资源来保持广度。
The 20-Minute Rule 20 分钟规则
As illustrated in Figure 2-6, technology breadth is more important to architects than depth. However, maintaining breadth takes time and effort, something architects should build into their day. But how in the world does anyone have the time to actually go to various websites to read articles, watch presentations, and listen to podcasts? The answer is…not many do. Developers and architects alike struggle with the balance of working a regular job, spending time with the family, being available for 如图 2-6 所示,技术广度对架构师来说比深度更重要。然而,保持广度需要时间和精力,这是架构师应该融入他们日常工作中的一部分。但究竟有谁有时间去各种网站阅读文章、观看演讲和收听播客呢?答案是……并不是很多人这样做。开发人员和架构师都在努力平衡正常工作、与家人共度时光以及随时待命的关系。
our children, carving out personal time for interests and hobbies, and trying to develop careers, while at the same time trying to keep up with the latest trends and buzzwords. 我们的孩子,腾出个人时间来追求兴趣和爱好,并努力发展职业,同时还要跟上最新的趋势和流行词汇。
One technique we use to maintain this balance is something we call the 20-minute rule. The idea of this technique, as illustrated in Figure 24-1, is to devote at least 20 minutes a day to your career as an architect by learning something new or diving deeper into a specific topic. Figure 24-1 illustrates examples of some of the types of resources to spend 20 minutes a day on, such as InfoQ, DZone Refcardz, and the ThoughtWorks Technology Radar. Spend that minimum of 20 minutes to Google some unfamiliar buzzwords (“the things you don’t know you don’t know” from Chapter 2) to learn a little about them, promoting that knowledge into the “things you know you don’t know.” Or maybe spend the 20 minutes going deeper into a particular topic to gain a little more knowledge about it. The point of this technique is to be able to carve out some time for developing a career as an architect and continuously gaining technical breadth. 我们用来保持这种平衡的一种技巧是我们称之为 20 分钟规则。这个技巧的理念,如图 24-1 所示,是每天至少花 20 分钟来关注你的架构师职业,通过学习新知识或深入研究特定主题。图 24-1 展示了一些可以花 20 分钟的资源示例,例如 InfoQ、DZone Refcardz 和 ThoughtWorks Technology Radar。花至少 20 分钟在谷歌上查找一些不熟悉的流行词汇(“你不知道你不知道的事情”来自第 2 章),以了解它们,促进这些知识成为“你知道你不知道的事情”。或者,也可以花 20 分钟深入研究某个特定主题,以获得更多的知识。这个技巧的关键在于能够为发展架构师职业腾出一些时间,并不断获得技术广度。
Figure 24-1. The 20-minute rule 图 24-1. 20 分钟规则
Many architects embrace this concept and plan to spend 20 minutes at lunch or in the evening after work to do this. What we have experienced is that this rarely works. Lunchtime gets shorter and shorter, becoming more of a catch-up time at work rather than a time to take a break and eat. Evenings are even worse-situations change, plans get made, family time becomes more important, and the 20 -minute rule never happens. 许多架构师接受这个概念,并计划在午餐或下班后的晚上花 20 分钟来做这件事。我们所经历的是,这种情况很少有效。午餐时间越来越短,变成了在工作中赶进度的时间,而不是休息和吃饭的时间。晚上情况更糟,情况变化,计划被制定,家庭时间变得更重要,而 20 分钟的规则从未实现。
We strongly recommend leveraging the 20 -minute rule first thing in the morning, as the day is starting. However, there is a caveat to this advice as well. For example, what is the first thing an architect does after getting to work in the morning? Well, the very 我们强烈建议在早晨开始时首先利用 20 分钟规则。然而,这条建议也有一个警告。例如,建筑师早上到达工作岗位后做的第一件事是什么?好吧,实际上
first thing the architect does is to get that wonderful cup of coffee or tea. OK, in that case, what is the second thing every architect does after getting that necessary coffee or tea-check email. Once an architect checks email, diversion happens, email responses are written, and the day is over. Therefore, our strong recommendation is to invoke the 20 -minute rule first thing in the morning, right after grabbing that cup of coffee or tea and before checking email. Go in to work a little early. Doing this will increase an architect’s technical breadth and help develop the knowledge required to become an effective software architect. 建筑师做的第一件事就是喝那杯美味的咖啡或茶。好吧,在这种情况下,每个建筑师在喝完必要的咖啡或茶后做的第二件事是什么呢?检查电子邮件。一旦建筑师检查了电子邮件,分心就会发生,电子邮件回复被写出,一天就结束了。因此,我们强烈建议在早晨的第一件事就是在喝完那杯咖啡或茶后,先遵循 20 分钟规则,然后再检查电子邮件。提前一点到达工作。这样做将增加建筑师的技术广度,并帮助发展成为有效软件架构师所需的知识。
Developing a Personal Radar 开发个人雷达
For most of the ’ 90 s and the beginning of the ’ 00 s , Neal was the CTO of a small training and consulting company. When he started there, the primary platform was Clipper, which was a rapid-application development tool for building DOS applications atop dBASE files. Until one day it vanished. The company had noticed the rise of Windows, but the business market was still DOS…until it abruptly wasn’t. That lesson left a lasting impression: ignore the march of technology at your peril. 在 90 年代大部分时间和 00 年代初,Neal 是一个小型培训和咨询公司的首席技术官。当他开始在那里工作时,主要平台是 Clipper,这是一种用于在 dBASE 文件上构建 DOS 应用程序的快速应用程序开发工具。直到有一天它消失了。公司注意到了 Windows 的崛起,但商业市场仍然是 DOS……直到它突然不再是。这个教训留下了深刻的印象:忽视技术的发展将自食其果。
It also taught an important lesson about technology bubbles. When heavily invested in a technology, a developer lives in a memetic bubble, which also serves as an echo chamber. Bubbles created by vendors are particularly dangerous, because developers never hear honest appraisals from within the bubble. But the biggest danger of Bubble Living comes when it starts collapsing, which developers never notice from the inside until it’s too late. 它还教会了一个关于技术泡沫的重要教训。当开发者在某项技术上投入大量资金时,他们生活在一个模因泡沫中,这也充当了一个回音室。供应商创造的泡沫尤其危险,因为开发者在泡沫内从未听到诚实的评估。但泡沫生活的最大危险在于它开始崩溃时,开发者从内部根本没有察觉,直到为时已晚。
What they lacked was a technology radar: a living document to assess the risks and rewards of existing and nascent technologies. The radar concept comes from ThoughtWorks; first, we’ll describe how this concept came to be and then how to use it to create a personal radar. 他们缺乏的是一个技术雷达:一个活文档,用于评估现有和新兴技术的风险和回报。雷达概念来自 ThoughtWorks;首先,我们将描述这个概念是如何产生的,然后介绍如何使用它来创建个人雷达。
The ThoughtWorks Technology Radar ThoughtWorks 技术雷达
The ThoughtWorks Technology Advisory Board (TAB) is a group of senior technology leaders within ThoughtWorks, created to assist the CTO, Dr. Rebecca Parsons, in deciding technology directions and strategies for the company and its clients. This group meets face-to-face twice a year. One of the outcomes of the face to face meeting was the Technology Radar. Over time, it gradually grew into the biannual Technology Radar. ThoughtWorks 技术顾问委员会(TAB)是 ThoughtWorks 内部一群高级技术领导者,旨在协助首席技术官 Rebecca Parsons 博士决定公司及其客户的技术方向和战略。该小组每年面对面会议两次。面对面会议的一个成果是技术雷达。随着时间的推移,它逐渐发展成为半年一次的技术雷达。
The TAB gradually settled into a twice-a-year rhythm of Radar production. Then, as often happens, unexpected side effects occurred. At some of the conferences Neal spoke at, attendees sought him out and thanked him for helping produce the Radar and said that their company had started producing their own version of it. TAB 逐渐形成了每年两次的雷达生产节奏。然后,正如常常发生的那样,意想不到的副作用出现了。在 Neal 参加的一些会议上,与会者主动找他,感谢他帮助制作雷达,并表示他们的公司已经开始生产自己的版本。
Neal also realized that this was the answer to a pervasive question at conference speaker panels everywhere: “How do you (the speakers) keep up with technology? How do you figure out what things to pursue next?” The answer, of course, is that they all have some form of internal radar. 尼尔也意识到这是在各个会议发言人小组中普遍存在的问题的答案:“你们(发言人)如何跟上技术?你们如何确定接下来追求哪些东西?”答案当然是,他们都有某种形式的内部雷达。
Parts 部分
The ThoughtWorks Radar consists of four quadrants that attempt to cover most of the software development landscape: ThoughtWorks 雷达由四个象限组成,试图涵盖大部分软件开发领域:
Tools 工具
Tools in the software development space, everything from developers tools like IDEs to enterprise-grade integration tools 软件开发领域的工具,从开发者工具如 IDE 到企业级集成工具
Languages and frameworks 语言和框架
Computer languages, libraries, and frameworks, typically open source 计算机语言、库和框架,通常是开源的
Techniques 技术
Any practice that assists software development overall; this may include software development processes, engineering practices, and advice 任何有助于整体软件开发的实践;这可能包括软件开发过程、工程实践和建议
Platforms 平台
Technology platforms, including databases, cloud vendors, and operating systems 技术平台,包括数据库、云供应商和操作系统
Rings 环
The Radar has four rings, listed here from outer to inner: 雷达有四个环,按从外到内的顺序列出:
Hold 保持
The original intent of the hold ring was “hold off for now,” to represent technologies that were too new to reasonably assess yet-technologies that were getting lots of buzz but weren’t yet proven. The hold ring has evolved into indicating “don’t start anything new with this technology.” There’s no harm in using it on existing projects, but developers should think twice about using it for new development. 保持环的最初意图是“暂时搁置”,以表示那些尚无法合理评估的过于新颖的技术——这些技术虽然备受关注,但尚未得到验证。保持环已经演变为表示“不要使用这项技术开始任何新项目”。在现有项目中使用它没有害处,但开发人员在进行新开发时应该三思而后行。
Assess 评估
The assess ring indicates that a technology is worth exploring with the goal of understanding how it will affect an organization. Architects should invest some effort (such as development spikes, research projects, and conference sessions) to see if it will have an impact on the organization. For example, many large companies visibly went through this phase when formulating a mobile strategy. 评估环节表明某项技术值得探索,目的是了解它将如何影响组织。架构师应该投入一些精力(例如开发尖峰、研究项目和会议环节)来看看它是否会对组织产生影响。例如,许多大型公司在制定移动战略时明显经历了这一阶段。
Trial 试验
The trial ring is for technologies worth pursuing; it is important to understand how to build up this capability. Now is the time to pilot a low-risk project so that architects and developers can really understand the technology. 试验环是用于值得追求的技术;理解如何建立这种能力是很重要的。现在是进行低风险项目试点的时候,以便架构师和开发人员能够真正理解这项技术。
Adopt 采用
For technologies in the adopt ring, ThoughtWorks feels strongly that the industry should adopt those items. 对于采用环中的技术,ThoughtWorks 强烈认为行业应该采纳这些项目。
An example view of the Radar appears in Figure 24-2. 雷达的示例视图出现在图 24-2 中。
Figure 24-2. A sample ThoughtWorks Technology Radar 图 24-2. 一个示例 ThoughtWorks 技术雷达
In Figure 24-2, each blip represents a different technology or technique, with associated short write-ups. While ThoughtWorks uses the radar to broadcast their opinions about the software world, many developers and architects also use it as a way of structuring their technology assessment process. Architects can use the tool described in “Open Source Visualization Bits” on page 371 to build the same visuals used by ThoughtWorks as a way to organize their thinking about what to invest time in. 在图 24-2 中,每个小点代表一种不同的技术或技巧,并附有简短的说明。虽然 ThoughtWorks 使用雷达来传播他们对软件世界的看法,但许多开发者和架构师也将其作为构建技术评估过程的一种方式。架构师可以使用第 371 页“开源可视化工具”中描述的工具,构建与 ThoughtWorks 使用的相同视觉效果,以组织他们对投资时间的思考。
When using the radar for personal use, we suggest altering the meanings of the quadrants to the following: 在个人使用雷达时,我们建议将象限的含义更改为以下内容:
Hold 保持
An architect can include not only technologies and techniques to avoid, but also habits they are trying to break. For example, an architect from the .NET world may be accustomed to reading the latest news/gossip on forums about team internals. While entertaining, it may be a low-value information stream. Placing that in hold forms a reminder for an architect to avoid problematic things. 架构师不仅可以列出需要避免的技术和技巧,还可以列出他们试图打破的习惯。例如,一位来自.NET 领域的架构师可能习惯于在论坛上阅读关于团队内部的最新新闻/八卦。虽然这很有趣,但这可能是一个低价值的信息流。将其搁置可以提醒架构师避免问题。
Assess 评估
Architects should use assess for promising technologies that they have heard good things about but haven’t had time to assess for themselves yet-see “Using Social Media” on page 371. This ring forms a staging area for more serious research at some time in the future. 架构师应该评估他们听说过但还没有时间自己评估的有前景的技术——见第 371 页的“使用社交媒体”。这个环节形成了一个未来进行更深入研究的准备区域。
Trial 试验
The trial ring indicates active research and development, such as an architect performing spike experiments within a larger code base. This ring represents technologies worth spending time on to understand more deeply so that an architect can perform an effective trade-off analysis. 试验环表示积极的研究和开发,例如架构师在更大的代码库中进行尖峰实验。这个环代表值得花时间深入理解的技术,以便架构师能够进行有效的权衡分析。
Adopt 采用
The adopt ring represents the new things an architect is most excited about and best practices for solving particular problems. 采用环代表了架构师最兴奋的新事物以及解决特定问题的最佳实践。
It is dangerous to adopt a laissez-faire attitude toward a technology portfolio. Most technologists pick technologies on a more or less ad hoc basis, based on what’s cool or what your employer is driving. Creating a technology radar helps an architect formalize their thinking about technology and balance opposing decision criteria (such as the “more cool” technology factor and being less likely to get a new job versus a huge job market but with less interesting work). Architects should treat their technology portfolio like a financial portfolio: in many ways, they are the same thing. What does a financial planner tell people about their portfolio? Diversify! 对技术组合采取放任自流的态度是危险的。大多数技术人员根据什么是酷或雇主推动的内容,或多或少地临时选择技术。创建技术雷达可以帮助架构师规范他们对技术的思考,并平衡相对立的决策标准(例如“更酷”的技术因素和不太可能找到新工作与庞大的就业市场但工作不那么有趣之间的权衡)。架构师应该像对待金融投资组合一样对待他们的技术组合:在许多方面,它们是相同的。金融规划师会告诉人们关于他们的投资组合什么?多样化!
Architects should choose some technologies and/or skills that are widely in demand and track that demand. But they might also want to try some technology gambits, like open source or mobile development. Anecdotes abound about developers who freed themselves from cubicle-dwelling servitude by working late at night on open source projects that became popular, purchasable, and eventually, career destinations. This is yet another reason to focus on breadth rather than depth. 架构师应该选择一些广泛需求的技术和/或技能,并跟踪这种需求。但他们也可能想尝试一些技术策略,比如开源或移动开发。关于开发者通过在晚上加班参与开源项目而摆脱办公室束缚的故事屡见不鲜,这些项目最终变得受欢迎、可购买,并最终成为职业目标。这又是一个关注广度而非深度的理由。
Architects should set aside time to broaden their technology portfolio, and building a radar provides a good scaffolding. However, the exercise is more important than the outcome. Creating the visualization provides an excuse to think about these things, 架构师应该留出时间来拓宽他们的技术组合,而构建一个雷达提供了一个良好的框架。然而,这个过程比结果更重要。创建可视化提供了一个思考这些问题的借口,
and, for busy architects, finding an excuse to carve out time in a busy schedule is the only way this kind of thinking can occur. 对于忙碌的架构师来说,找到一个借口在繁忙的日程中抽出时间是这种思维发生的唯一方式。
Open Source Visualization Bits 开源可视化组件
By popular demand, ThoughtWorks released a tool in November 2016 to assist technologists in building their own radar visualization. When ThoughtWorks does this exercise for companies, they capture the output of the meeting in a spreadsheet, with a page for each quadrant. The ThoughtWorks Build Your Own Radar tool uses a Google spreadsheet as input and generates the radar visualization using an HTML 5 canvas. Thus, while the important part of the exercise is the conversations it generates, it also generates useful visualizations. 应广大需求,ThoughtWorks 于 2016 年 11 月发布了一款工具,帮助技术人员构建自己的雷达可视化。当 ThoughtWorks 为公司进行此项练习时,他们会在电子表格中记录会议的输出,每个象限一个页面。ThoughtWorks Build Your Own Radar 工具使用 Google 电子表格作为输入,并通过 HTML 5 画布生成雷达可视化。因此,尽管这项练习的重要部分是它所产生的对话,但它也生成了有用的可视化。
Using Social Media 使用社交媒体
Where can an architect find new technologies and techniques to put in the assess ring of their radar? In Andrew McAfee’s book Enterprise 2.0 (Harvard Business Review Press), he makes an interesting observation about social media and social networks in general. When thinking about a person’s network of contact between people, three categories exist, as illustrated in Figure 24-3. 建筑师可以在哪里找到新的技术和技巧,以便将其纳入他们的评估雷达?在安德鲁·麦卡菲的书《企业 2.0》(哈佛商业评论出版社)中,他对社交媒体和社交网络做了一个有趣的观察。当考虑一个人与人之间的联系网络时,存在三种类别,如图 24-3 所示。
Figure 24-3. Social circles of a person’s relationships 图 24-3. 一个人关系的社交圈
In Figure 24-3, strong links represent family members, coworkers, and other people whom a person regularly contacts. One litmus test for how close these connections are: they can tell you what a person in their strong links had for lunch at least one day last week. Weak links are casual acquaintances, distant relatives, and other people seen 在图 24-3 中,强链接代表家庭成员、同事和其他一个人定期联系的人。判断这些连接有多亲密的一个试金石是:他们可以告诉你他们的强链接中的某个人上周至少有一天吃了什么午餐。弱链接是随意的熟人、远亲和其他见过的人。
only a few times a year. Before social media, it was difficult to keep up with this circle of people. Finally, potential links represent people you haven’t met yet. 一年只有几次。在社交媒体出现之前,跟上这个圈子的人很困难。最后,潜在的联系代表你还没有见过的人。
McAfee’s most interesting observation about these connections was that someone’s next job is more likely to come from a weak link than a strong one. Strongly linked people know everything within the strongly linked group-these are people who see each other all the time. Weak links, on the other hand, offer advice from outside someone’s normal experience, including new job offers. 麦咖啡对这些联系最有趣的观察是,某人的下一个工作更有可能来自一个弱链接而不是强链接。强链接的人了解强链接组内的一切——这些人总是见面。另一方面,弱链接则提供来自某人正常经验之外的建议,包括新的工作机会。
Using the characteristics of social networks, architects can utilize social media to enhance their technical breadth. Using social media like Twitter professionally, architects should find technologists whose advice they respect and follow them on social media. This allows an architect to build a network on new, interesting technologies to assess and keep up with the rapid changes in the technology world. 利用社交网络的特性,架构师可以利用社交媒体来增强他们的技术广度。架构师在专业上使用像 Twitter 这样的社交媒体时,应寻找他们尊重的技术专家并在社交媒体上关注他们。这使得架构师能够建立一个关于新技术和有趣技术的网络,以评估并跟上技术世界的快速变化。
Parting Words of Advice 告别寄语
How do we get great designers? Great designers design, of course. 我们如何培养优秀的设计师?优秀的设计师当然是设计的。
—Fred Brooks —弗雷德·布鲁克斯
So how are we supposed to get great architects, if they only get the chance to architect fewer than a half-dozen times in their career? 那么,如果他们在职业生涯中只有不到六次的机会进行架构设计,我们应该如何培养出优秀的架构师呢?
-Ted Neward - Ted Neward
Practice is the proven way to build skills and become better at anything in life… including architecture. We encourage new and existing architects to keep honing their skills, both for individual technology breadth but also for the craft of designing architecture. To that end, check out the architecture katas on the companion website for the book. Modeled after the katas used as examples here, we encourage architects to use these to practice building skills in architecture. 实践是培养技能和在生活中变得更好的有效方法……包括架构。我们鼓励新任和现有的架构师不断磨练他们的技能,不仅为了个人技术的广度,也为了设计架构的工艺。为此,请查看本书附属网站上的架构 kata。我们鼓励架构师使用这些 kata 来练习在架构方面的技能。
A common question we get about katas: is there an answer guide somewhere? Unfortunately such an answer key does not exist. To quote your author, Neal: 我们经常收到关于 katas 的一个常见问题:是否有答案指南?不幸的是,答案钥匙并不存在。引用你的作者 Neal 的话:
There are not right or wrong answers in architecture-only trade-offs. 在架构中没有对或错的答案,只有权衡。
When we started using the architecture katas exercise during live training classes, we initially kept the drawings the students produced with the goal of creating an answer repository. We quickly gave up, though, because we realized that we had incomplete artifacts. In other words, the teams had captured the topology and explained their decisions in class but didn’t have the time to create architecture decision records. While how they implemented their solutions was interesting, the why was much more interesting because it contains the trade-offs they considered in making that decision. Keeping just the how was only half of the story. So, our last parting words of advice: always learn, always practice, and go do some architecture! 当我们开始在现场培训课程中使用架构练习时,我们最初保留了学生们制作的图纸,目的是创建一个答案库。然而,我们很快放弃了,因为我们意识到我们拥有的不完整的工件。换句话说,团队捕捉了拓扑结构并在课堂上解释了他们的决策,但没有时间创建架构决策记录。虽然他们实施解决方案的方式很有趣,但为什么这样做更有趣,因为它包含了他们在做出该决策时考虑的权衡。仅仅保留“如何”只是故事的一半。因此,我们最后的建议是:永远学习,永远实践,去做一些架构吧!
Self-Assessment Questions 自我评估问题
Chapter 1: Introduction 第 1 章:介绍
What are the four dimensions that define software architecture? 定义软件架构的四个维度是什么?
What is the difference between an architecture decision and a design principle? 架构决策和设计原则之间有什么区别?
List the eight core expectations of a software architect. 列出软件架构师的八个核心期望。
What is the First Law of Software Architecture? 软件架构的第一法则是什么?
Chapter 2: Architectural Thinking 第二章:架构思维
Describe the traditional approach of architecture versus development and explain why that approach no longer works. 描述传统的架构与开发方法,并解释为什么这种方法不再有效。
List the three levels of knowledge in the knowledge triangle and provide an example of each. 列出知识三角形中的三个知识层次,并提供每个层次的示例。
Why is it more important for an architect to focus on technical breadth rather than technical depth? 为什么建筑师更应该关注技术广度而不是技术深度?
What are some of the ways of maintaining your technical depth and remaining hands-on as an architect? 作为架构师,保持技术深度和保持动手能力的方式有哪些?
Chapter 3: Modularity 第三章:模块化
What is meant by the term connascence? “connascence”这个术语是什么意思?
What is the difference between static and dynamic connascence? 静态共生和动态共生之间有什么区别?
What does connascence of type mean? Is it static or dynamic connascence? 类型的共生性是什么意思?它是静态共生性还是动态共生性?
What is the strongest form of connascence? 什么是最强的共生形式?
What is the weakest form of connascence? 什么是最弱的共生形式?
Which is preferred within a code base-static or dynamic connascence? 在代码库中,静态共生和动态共生哪个更受欢迎?
Chapter 4: Architecture Characteristics Defined 第 4 章:架构特征定义
What three criteria must an attribute meet to be considered an architecture characteristic? 一个属性必须满足哪些三个标准才能被视为架构特征?
What is the difference between an implicit characteristic and an explicit one? Provide an example of each. 隐性特征和显性特征之间有什么区别?请提供每种特征的一个例子。
Provide an example of an operational characteristic. 提供一个操作特征的例子。
Provide an example of a structural characteristic. 提供一个结构特征的例子。
Provide an example of a cross-cutting characteristic. 提供一个横切特征的例子。
Which architecture characteristic is more important to strive for-availability or performance? 哪个架构特性更重要,应该追求可用性还是性能?
Give a reason why it is a good practice to limit the number of characteristics (“ilities”) an architecture should support. 给出一个理由,说明为什么限制架构应支持的特性(“ility”)数量是一个好做法。
True or false: most architecture characteristics come from business requirements and user stories. 正确或错误:大多数架构特性来自业务需求和用户故事。
If a business stakeholder states that time-to-market (i.e., getting new features and bug fixes pushed out to users as fast as possible) is the most important business concern, which architecture characteristics would the architecture need to support? 如果一个业务利益相关者表示市场时间(即尽快将新功能和错误修复推送给用户)是最重要的业务关注点,那么架构需要支持哪些架构特性?
What is the difference between scalability and elasticity? 可扩展性和弹性之间有什么区别?
You find out that your company is about to undergo several major acquisitions to significantly increase its customer base. Which architectural characteristics should you be worried about? 你发现你的公司即将进行几项重大收购,以显著增加其客户基础。你应该担心哪些架构特性?
Chapter 6: Measuring and Governing Architecture Characteristics 第六章:测量和管理架构特性
Why is cyclomatic complexity such an important metric to analyze for architecture? 为什么圈复杂度是分析架构如此重要的指标?
What is an architecture fitness function? How can they be used to analyze an architecture? 什么是架构适应性函数?它们如何用于分析架构?
Provide an example of an architecture fitness function to measure the scalability of an architecture. 提供一个架构适应性函数的示例,以衡量架构的可扩展性。
What is the most important criteria for an architecture characteristic to allow architects and developers to create fitness functions? 架构特征允许架构师和开发人员创建适应性函数的最重要标准是什么?
Chapter 7: Scope of Architecture Characteristics 第七章:架构特征的范围
What is an architectural quantum, and why is it important to architecture? 什么是建筑量子,它为什么对建筑很重要?
Assume a system consisting of a single user interface with four independently deployed services, each containing its own separate database. Would this system have a single quantum or four quanta? Why? 假设一个系统由一个单一的用户界面和四个独立部署的服务组成,每个服务都有自己独立的数据库。这个系统是有一个量子还是四个量子?为什么?
Assume a system with an administration portion managing static reference data (such as the product catalog, and warehouse information) and a customer-facing portion managing the placement of orders. How many quanta should this system be and why? If you envision multiple quanta, could the admin quantum and customer-facing quantum share a database? If so, in which quantum would the database need to reside? 假设一个系统有一个管理部分负责管理静态参考数据(例如产品目录和仓库信息)和一个面向客户的部分负责管理订单的下达。这个系统应该有多少个量子,为什么?如果你设想多个量子,管理量子和面向客户的量子可以共享一个数据库吗?如果可以,数据库需要位于哪个量子中?
Chapter 8: Component-Based Thinking 第 8 章:基于组件的思维
We define the term component as a building block of an application-something the application does. A component usually consist of a group of classes or source files. How are components typically manifested within an application or service? 我们将“组件”一词定义为应用程序的构建块——应用程序所做的事情。组件通常由一组类或源文件组成。组件通常是如何在应用程序或服务中体现的?
What is the difference between technical partitioning and domain partitioning? Provide an example of each. 技术分区和领域分区之间有什么区别?请提供每种分区的一个例子。
What is the advantage of domain partitioning? 领域划分的优势是什么?
Under what circumstances would technical partitioning be a better choice over domain partitioning? 在什么情况下技术分区会比领域分区更好?
What is the entity trap? Why is it not a good approach for component identification? 什么是实体陷阱?为什么这不是一个好的组件识别方法?
When might you choose the workflow approach over the Actor/Actions approach when identifying core components? 在识别核心组件时,何时可能选择工作流方法而不是 Actor/Actions 方法?
Chapter 9: Architecture Styles 第 9 章:架构风格
List the eight fallacies of distributed computing. 列出分布式计算的八个谬论。
Name three challenges that distributed architectures have that monolithic architectures don’t. 列举三个分布式架构面临的挑战,而单体架构则没有的挑战。
What is stamp coupling? 什么是印章耦合?
What are some ways of addressing stamp coupling? 如何解决印章耦合问题?
What is the difference between an open layer and a closed layer? 开放层和封闭层之间有什么区别?
Describe the layers of isolation concept and what the benefits are of this concept. 描述隔离层的概念及其带来的好处。
What is the architecture sinkhole anti-pattern? 什么是架构下沉反模式?
What are some of the main architecture characteristics that would drive you to use a layered architecture? 使用分层架构的主要架构特征有哪些?
Why isn’t testability well supported in the layered architecture style? 为什么分层架构风格不太支持可测试性?
Why isn’t agility well supported in the layered architecture style? 为什么分层架构风格不太支持敏捷性?
Chapter 11: Pipeline Architecture 第 11 章:管道架构
Can pipes be bidirectional in a pipeline architecture? 在管道架构中,管道可以是双向的吗?
Name the four types of filters and their purpose. 列出四种类型的过滤器及其目的。
Can a filter send data out through multiple pipes? 过滤器可以通过多个管道发送数据吗?
Is the pipeline architecture style technically partitioned or domain partitioned? 管道架构风格是技术分区还是领域分区?
In what way does the pipeline architecture support modularity? 管道架构以什么方式支持模块化?
Provide two examples of the pipeline architecture style. 提供两个管道架构风格的例子。
Chapter 12: Microkernel Architecture 第 12 章:微内核架构
What is another name for the microkernel architecture style? 微内核架构风格的另一个名称是什么?
Under what situations is it OK for plug-in components to be dependent on other plug-in components? 在什么情况下插件组件可以依赖其他插件组件?
What are some of the tools and frameworks that can be used to manage plug-ins? 管理插件可以使用哪些工具和框架?
What would you do if you had a third-party plug-in that didn’t conform to the standard plug-in contract in the core system? 如果你有一个不符合核心系统标准插件合同的第三方插件,你会怎么做?
Provide two examples of the microkernel architecture style. 提供微内核架构风格的两个示例。
Is the microkernel architecture style technically partitioned or domain partitioned? 微内核架构风格是技术分区还是领域分区?
Why is the microkernel architecture always a single architecture quantum? 为什么微内核架构总是一个单一的架构量子?
What is domain/architecture isomorphism? 什么是领域/架构同构?
What are the primary differences between the broker and mediator topologies? 代理和中介拓扑之间的主要区别是什么?
For better workflow control, would you use the mediator or broker topology? 为了更好的工作流程控制,您会使用中介者还是代理拓扑?
Does the broker topology usually leverage a publish-and-subscribe model with topics or a point-to-point model with queues? 代理拓扑通常是利用主题的发布-订阅模型,还是利用队列的点对点模型?
Name two primary advantage of asynchronous communications. 列举异步通信的两个主要优点。
Give an example of a typical request within the request-based model. 在基于请求的模型中给出一个典型请求的例子。
Give an example of a typical request in an event-based model. 在事件驱动模型中给出一个典型请求的例子。
What is the difference between an initiating event and a processing event in event-driven architecture? 在事件驱动架构中,启动事件和处理事件之间有什么区别?
What are some of the techniques for preventing data loss when sending and receiving messages from a queue? 在从队列发送和接收消息时,有哪些防止数据丢失的技术?
What are three main driving architecture characteristics for using event-driven architecture? 使用事件驱动架构的三个主要驱动架构特征是什么?
What are some of the architecture characteristics that are not well supported in event-driven architecture? 在事件驱动架构中,哪些架构特性得不到很好的支持?
Where does space-based architecture get its name from? 空间基础架构这个名称来源于哪里?
What is a primary aspect of space-based architecture that differentiates it from other architecture styles? 空间基础架构的一个主要方面是什么,使其与其他架构风格有所区别?
Name the four components that make up the virtualized middleware within a space-based architecture. 命名构成基于空间架构的虚拟化中间件的四个组件。
What is the role of the messaging grid? 消息网格的角色是什么?
What is the role of a data writer in space-based architecture? 在基于空间的架构中,数据写入者的角色是什么?
Under what conditions would a service need to access data through the data reader? 在什么情况下服务需要通过数据读取器访问数据?
Does a small cache size increase or decrease the chances for a data collision? 小缓存大小是增加还是减少数据冲突的可能性?
What is the difference between a replicated cache and a distributed cache? Which one is typically used in space-based architecture? 复制缓存和分布式缓存之间有什么区别?在基于空间的架构中通常使用哪一个?
List three of the most strongly supported architecture characteristics in spacebased architecture. 列出空间基础架构中三个最强烈支持的架构特征。
Why does testability rate so low for space-based architecture? 为什么基于空间的架构的可测试性评分如此之低?
Why is the bounded context concept so critical for microservices architecture? 为什么边界上下文概念对微服务架构如此重要?
What are three ways of determining if you have the right level of granularity in a microservice? 确定微服务是否具有正确粒度的三种方法是什么?
What functionality might be contained within a sidecar? 侧车中可能包含哪些功能?
What is the difference between orchestration and choreography? Which does microservices support? Is one communication style easier in microservices? 编排和舞蹈之间有什么区别?微服务支持哪一种?在微服务中,哪种通信风格更容易?
What is a saga in microservices? 微服务中的 saga 是什么?
Why are agility, testability, and deployability so well supported in microservices? 为什么微服务在敏捷性、可测试性和可部署性方面得到了如此良好的支持?
What are two reasons performance is usually an issue in microservices? 微服务中性能通常成为问题的两个原因是什么?
Is microservices a domain-partitioned architecture or a technically partitioned one? 微服务是领域划分架构还是技术划分架构?
Describe a topology where a microservices ecosystem might be only a single quantum. 描述一个拓扑,其中微服务生态系统可能仅是一个单一的量子。
How was domain reuse addressed in microservices? How was operational reuse addressed? 在微服务中如何解决领域重用?如何解决操作重用?
Chapter 18: Choosing the Appropriate Architecture Style 第 18 章:选择合适的架构风格
In what way does the data architecture (structure of the logical and physical data models) influence the choice of architecture style? 数据架构(逻辑和物理数据模型的结构)以什么方式影响架构风格的选择?
How does it influence your choice of architecture style to use? 它如何影响你选择使用的架构风格?
Delineate the steps an architect uses to determine style of architecture, data partitioning, and communication styles. 划分架构师用于确定架构风格、数据分区和通信风格的步骤。
What factor leads an architect toward a distributed architecture? 是什么因素使架构师倾向于分布式架构?
Chapter 19: Architecture Decisions 第 19 章:架构决策
What is the covering your assets anti-pattern? 什么是覆盖资产反模式?
What are some techniques for avoiding the email-driven architecture antipattern? 避免以电子邮件驱动的架构反模式的一些技术是什么?
What are the five factors Michael Nygard defines for identifying something as architecturally significant? 迈克尔·尼加德定义的用于识别某事为架构上重要的五个因素是什么?
What are the five basic sections of an architecture decision record? 架构决策记录的五个基本部分是什么?
In which section of an ADR do you typically add the justification for an architecture decision? 在 ADR 的哪个部分通常添加架构决策的理由?
Assuming you don’t need a separate Alternatives section, in which section of an ADR would you list the alternatives to your proposed solution? 假设您不需要单独的替代方案部分,您会在 ADR 的哪个部分列出您提议解决方案的替代方案?
What are three basic criteria in which you would mark the status of an ADR as Proposed? 您会以哪些三个基本标准将 ADR 的状态标记为“提议”?
What are the two dimensions of the risk assessment matrix? 风险评估矩阵的两个维度是什么?
What are some ways to show direction of particular risk within a risk assessment? Can you think of other ways to indicate whether risk is getting better or worse? 在风险评估中,有哪些方法可以显示特定风险的方向?你能想到其他方法来指示风险是变好还是变坏吗?
Why is it necessary for risk storming to be a collaborative exercise? 为什么风险风暴必须是一个协作的过程?
Why is it necessary for the identification activity within risk storming to be an individual activity and not a collaborative one? 为什么在风险风暴中,识别活动有必要是个人活动而不是协作活动?
What would you do if three participants identified risk as high (6) for a particular area of the architecture, but another participant identified it as only medium (3)? 如果三位参与者将某个架构领域的风险评估为高(6),但另一位参与者仅将其评估为中等(3),你会怎么做?
What risk rating (1-9) would you assign to unproven or unknown technologies? 您会给未经验证或未知技术分配什么风险评级(1-9)?
Chapter 21: Diagramming and Presenting Architecture 第 21 章:架构图示与展示
What is irrational artifact attachment, and why is it significant with respect to documenting and diagramming architecture? 什么是非理性工件依附,它在记录和绘制架构方面为什么重要?
What do the 4 C’s refer to in the C 4 modeling technique? C 4 建模技术中的 4 C 指的是什么?
When diagramming architecture, what do dotted lines between components mean? 在绘制架构图时,组件之间的虚线表示什么?
What is the bullet-riddled corpse anti-pattern? How can you avoid this antipattern when creating presentations? 什么是子弹孔遍布的尸体反模式?在制作演示时,如何避免这种反模式?
What are the two primary information channels a presenter has when giving a presentation? 演讲者在进行演讲时主要有哪两个信息渠道?
Chapter 22: Making Teams Effective 第 22 章:提升团队效能
What are three types of architecture personalities? What type of boundary does each personality create? 三种架构个性是什么?每种个性创建什么类型的边界?
What are the five factors that go into determining the level of control you should exhibit on the team? 决定您在团队中应展现的控制水平的五个因素是什么?
What are three warning signs you can look at to determine if your team is getting too big? 你可以观察哪些三个警告信号来判断你的团队是否变得太大?
List three basic checklists that would be good for a development team. 列出三个适合开发团队的基本检查清单。
Chapter 23: Negotiation and Leadership Skills 第 23 章:谈判与领导技能
Why is negotiation so important as an architect? 作为架构师,为什么谈判如此重要?
Name some negotiation techniques when a business stakeholder insists on five nines of availability, but only three nines are really needed. 当业务利益相关者坚持要求五个九的可用性,但实际上只需要三个九时,列举一些谈判技巧。
What can you derive from a business stakeholder telling you “I needed it yesterday”? 从业务利益相关者告诉你“我昨天就需要它”中你能得出什么?
Why is it important to save a discussion about time and cost for last in a negotiation? 为什么在谈判中将关于时间和成本的讨论放在最后是重要的?
What is the divide-and-conquer rule? How can it be applied when negotiating architecture characteristics with a business stakeholder? Provide an example. 什么是分而治之原则?在与业务利益相关者协商架构特性时,如何应用它?请提供一个例子。
List the 4 C’s of architecture. 列出建筑的 4 个 C。
Explain why it is important for an architect to be both pragmatic and visionary. 解释为什么建筑师既要务实又要有远见是重要的。
What are some techniques for managing and reducing the number of meetings you are invited to? 有哪些技巧可以管理和减少您被邀请参加的会议数量?
Chapter 24: Developing a Career Path 第 24 章:发展职业路径
What is the 20 -minute rule, and when is it best to apply it? 20 分钟规则是什么,何时最好应用它?
What are the four rings in the ThoughtWorks technology radar, and what do they mean? How can they be applied to your radar? ThoughtWorks 技术雷达中的四个环是什么,它们意味着什么?它们如何应用于你的雷达?
Describe the difference between depth and breadth of knowledge as it applies to software architects. Which should architects aspire to maximize? 描述软件架构师在知识的深度和广度之间的区别。架构师应该追求最大化哪一方面?
Index 索引
A
acceleration of rate of change in software development ecosystem, 268 软件开发生态系统中变化速率的加速,268
accessibility, 59 可访问性,59
accidental architecture anti-pattern, 133 意外架构反模式,133
accidental complexity, 354 意外复杂性,354
accountability, 61 问责制,61
achievability, 60 可实现性,60
ACID transactions, 132 ACID 事务,132
in service-based architecture, 177 在基于服务的架构中,177
in services of service-based architecture, 168 在基于服务的架构的服务中,168
actions provided by presention tools, 321 演示工具提供的操作,321
actor/actions approach to designing compo- actor/actions 方法来设计组件
nents, 111
in Going, Going, Gone case study, 112 在《Going, Going, Gone》案例研究中,112
actual productivity (of development teams), 实际生产力(开发团队的),
335
adaptability, 62 适应性,62
administrators (network), 129 网络管理员,129
ADR-tools, 285 ADR-tools,285
ADRs (architecture decision records), 285-295 ADRs(架构决策记录),285-295
as documentation, 293 作为文档,293
auction system example, 294 拍卖系统示例,294
basic structure of, 285 基本结构,285
compliance section, 290 合规部分,290
context section, 288 上下文部分,288
decision section, 288 决策部分,288
draft ADR, request for comments on, 287 草案 ADR,征求意见,287
notes section, 291 笔记部分,291
status, 286 状态,286
storing, 291 存储,291
title, 286 标题,286
using for standards, 293 使用标准,293
Agile development 敏捷开发
Agile Story risk analysis, 308 敏捷故事风险分析,308
creation of just-in-time artifacts, 317 及时生成工件,317
extreme programming and, 15 极限编程和,15
software architecture and, 18, 101 软件架构和,18,101
agility 敏捷性
process measures of, 81 过程度量,81
rating in service-based architecture, 176 基于服务的架构中的评级,176
versus time to market, 67 与市场时间相比,67
Ambulance pattern, 312 救护车模式,312
analyzability, 62 可分析性,62
animations provided by presentation tools, 321 演示工具提供的动画,321
anti-patterns 反模式
Big Ball of Mud, 85, 120 大泥球,85,120
Bullet-Riddled Corpse in corporate presentations, 322 在企业演示中的弹孔尸体,322
Cookie-Cutter, 321 Cookie-Cutter,321
Covering Your Assets, 282 覆盖您的资产,282
Email-Driven Architecture, 283 电子邮件驱动架构,283
Entity Trap, 110 实体陷阱,110
Frozen Caveman, 30 冰冻穴居人,30
Generic Architecture, 65 通用架构,65
Groundhog Day, 282 土拨鼠日,282
Irrational Artifact Attachment, 316 非理性工件依附,316
Ivory Tower Architect, 74, 351 象牙塔架构师,74,351
anvils dropping effects, 321 铁砧掉落效果,321
Apache Camel, 186
Apache Ignite, 213
Apache Kafka, 146
Apache ODE, 186 Apache ODE,186
Apache Zookeeper, 157
API layer API 层
in microservices architecture, 249 在微服务架构中,249
in service-based architecture, 167 在基于服务的架构中,167
security risks of Diagnostics System API gateway, 313 诊断系统 API 网关的安全风险,313
application logic in processing units, 213 处理单元中的应用逻辑,213
application servers 应用服务器
scaling, problems with, 211 扩展,问题,211
vendors battling with database server vendors, 235 供应商与数据库服务器供应商的竞争,235
application services, 237 应用服务,237
ArchiMate, 319
architects (see software architects) 架构师(见软件架构师)
architectural extensibility, 32 架构可扩展性,32
in broker topology of event-driven architecture, 182 在事件驱动架构的代理拓扑中,182
architectural fitness functions, 17 架构适应性函数,17
architectural thinking, 23-36 建筑思维,23-36
analyzing trade-offs, 30 分析权衡,30
architecture versus design, 23-25 架构与设计,23-25
balancing architecture and hands-on coding, 34 平衡架构和实际编码,34
self-assessment questions, 373 自我评估问题,373
understanding business drivers, 34 理解业务驱动因素,34
architecturally significant, 284 在架构上重要的,284
architecture by implication anti-pattern, 133 架构隐含反模式,133
architecture characteristics, 55-64 架构特性,55-64
about, 55-58 关于,55-58
analyzing for components, 109 分析组件,109
cross-cutting, 59 横切关注点,59
defined, self-assessment questions, 374 定义,自我评估问题,374
definitions of terms from the ISO, 61 ISO 术语的定义,61
in distributed architecture Going, Going, Gone case study, 274 在分布式架构的“Going, Going, Gone”案例研究中,274
fitness functions testing cyclic dependencies example, 84-86 distance from main sequence example, 86-88 适应度函数测试循环依赖示例,84-86 主序列距离示例,86-88
governance of, 82 治理,82
identifying, 65-75 识别,65-75
design versus architecture and tradeoffs, 74 设计与架构及权衡,74
extracting from domain concerns, 65-67 从领域关注中提取,65-67
extracting from requirements, 67-69 从需求中提取,67-69
self-assessment questions, 374 自我评估问题,374
Silicon Sandwiches case study, 69-74 硅三明治案例研究,69-74
incorporating into Going, Going, Gone 纳入《Going, Going, Gone》
component design, 114 组件设计,114
measuring, 77-82 测量,77-82
operational measures, 78 操作性措施,78
process measures, 81 过程度量,81
structural measures, 79-81 结构措施,79-81
measuring and governing, self-assessment questions, 375 测量和治理,自我评估问题,375
operational, 58 操作性,58
partial listing of, 58 部分列表,58
ratings in event-driven architecture, 207-209 事件驱动架构中的评级,207-209
ratings in layered architecture, 139 分层架构中的评级,139
ratings in microkernel architecture, 160 微内核架构中的评级,160
ratings in microservices architecture, 263-265 微服务架构中的评级,263-265
ratings in orchestration-driven serviceoriented architecture, 241 在以编排驱动的服务导向架构中的评分,241
ratings in pipeline architecture, 146 管道架构中的评级,146
ratings in service-based architecture, 174 基于服务的架构中的评级,174
ratings in space-based architecture, 233 基于空间的架构中的评级,233
scope of, 91-98 范围,91-98
architectural quanta and granularity, 92-98 建筑量子和粒度,92-98
coupling and connascence, 92 耦合和共生,92
self-assessment questions, 375 自我评估问题,375
structural, 59 结构性, 59
in synchronous vs. asynchronous communications between services, 270 在服务之间的同步与异步通信中,270
trade-offs and least worst architecture, 63 权衡和最差架构,63
architecture decision records (see ADRs) 架构决策记录(见 ADRs)
architecture decisions, 6, 281-295 架构决策,6,281-295
anti-patterns, 281-284 反模式,281-284
Covering Your Assets, 282 覆盖您的资产,282
Email-Driven Architecture, 283 电子邮件驱动架构,283
Groundhog Day, 282 土拨鼠日,282
architecturally significant, 284 在架构上重要的,284
architecture decision records (ADRs), 285-295 架构决策记录 (ADRs), 285-295
self-assessment questions, 380 自我评估问题,380
architecture fitness function, 83 架构适应性函数,83
architecture katas 架构练习
origin of, 68 起源于,68
reference on, 372 参考文献,372
Silicon Sandwiches case study, 69-74 硅三明治案例研究,69-74
architecture partitioning, 102 架构分区,102
architecture quantum, 91 架构量子,91
architectural quanta and granularity, 92-98 Going, Going, Gone case study, 95-98 架构量子和粒度,92-98 进行中,进行中,已结束案例研究,95-98
architectural quanta in microservices, 265 微服务中的架构量,265
architecture quanta in event-driven architecture, 208 事件驱动架构中的架构量,208
architecture quanta in space-based architecture, 234 空间基础架构中的架构量子,234
choosing between monolithic and distributed architectures in Going, Going, Gone component design, 115-116 在组件设计中选择单体架构和分布式架构,115-116
in orchestration-driven service-oriented architecture, 242 在以编排驱动的服务导向架构中,242
quanta boundaries for distributed architecture Going, Going, Gone case study, 276 分布式架构的量子边界《Going, Going, Gone》案例研究,276
separate quanta in service-based architecture, 175 服务导向架构中的独立量子,175
architecture risk, analyzing, 297-314 架构风险,分析,297-314
Agile story risk analysis, 308 敏捷故事风险分析,308
risk assessments, 298 风险评估,298
risk matrix for, 297 风险矩阵,297
risk storming, 302-308 风险风暴,302-308
consensus, 304 共识,304
identifying areas of risk, 303 识别风险领域,303
mitigation of risk, 307 风险缓解,307
risk storming examples, 308-314 风险风暴示例,308-314
availability of nurse diagnostics system, 310 护士诊断系统的可用性,310
elasticity of nurse diagnostics system, 312 护士诊断系统的弹性,312
nurse diagnostics system, 308 护士诊断系统,308
security in nurse diagnostics system, 313 护士诊断系统中的安全性,313
self-assessment questions, 380 自我评估问题,380
Architecture Sinkhole anti-pattern, 138 架构陷阱反模式,138
architecture sinkhole anti-pattern 架构下沉洞反模式
microkernel architecture and, 161 微内核架构和,161
architecture styles, 119-132 架构风格,119-132
choosing the appropriate style, 267-277 选择适当的风格,267-277
decision criteria, 269-271 决策标准,269-271
distributed architecture in Going, Going, Gone case study, 274-277 在《Going, Going, Gone》案例研究中的分布式架构,274-277
monolithic architectures in Silicon Sandwiches case study, 271-274 硅三明治案例研究中的单体架构,271-274
self-assessment questions, 379 自我评估问题,379
shifting fashion in architecture, 267-268 建筑中的转变方式,267-268
defined, 119 定义,119
fundamental patterns, 119-123 基本模式,119-123
monolithic versus distributed architectures, 123-132 单体架构与分布式架构,123-132
self-assessment questions, 376 自我评估问题,376
architecture vitality, 9 架构活力,9
archivability, 59 可归档性,59
ArchUnit (Java), 36, 87
fitness function to govern layers, 87 适应度函数来管理层,87
argumentativeness or getting personal, avoiding, 351 争论性或变得个人化,避免,351
armchair architects, 328 扶手椅架构师,328
arrows indicating direction of risk, 301 指示风险方向的箭头,301
The Art of War (Sun Tzu), 350 孙子兵法,350
asynchronous communication, 254, 270 异步通信,254,270
in event-driven architecture, 196-197 在事件驱动架构中,196-197
in microservices implementation of Going, Going, Gone, 276 在微服务实施中,Going, Going, Gone,276
asynchronous connascence, 92, 94 异步共生,92,94
auditability 可审计性
in Going, Going, Gone case study, 96 在《Going, Going, Gone》案例研究中,96
performance and, 67 性能和,67
authentication/authorization, 59 身份验证/授权,59
authenticity, 61 真实性,61
auto acknowledge mode, 202 自动确认模式,202
automation 自动化
leveraging, 35 利用,35
on software projects, drive toward, 82 在软件项目上,推动,82
availability, 57 可用性,57
basic availability in BASE transactions, 132 BASE 事务中的基本可用性,132
in Going, Going, Gone case study, 97 在《Going, Going, Gone》案例研究中,97
implicit architecture characteristic, 73 隐式架构特征,73
in Going, Going, Gone: discovering commponents case study, 114 在《Going, Going, Gone: discovering components case study》中,114
in nurse diagnostics system risk storming example, 310 在护士诊断系统风险风暴示例中,310
Italy-ility and, 60 意大利-ility 和,60
in layered architecture, 141 在分层架构中,141
negotiating with business stakeholders about, 349 与业务利益相关者进行谈判,349
nines of, 349
performance and, 67 性能和,67
in pipeline architecture, 148 在管道架构中,148
rating in service-based architecture, 176 基于服务的架构中的评级,176
reliability versus, 60 可靠性与,60
B
Backends for Frontends (BFF) pattern, 273 前端后端(BFF)模式,273
bandwidth is infinite fallacy, 126 带宽是无限的谬论,126
BASE transactions, 132 BASE 事务,132
in service-based architecture, 177 在基于服务的架构中,177
basic availability, soft state, eventual consis- 基本可用性,软状态,最终一致性
tency (see BASE transactions) tency(见 BASE 事务)
Big Ball of Mud anti-pattern, 85, 120 大泥球反模式,85,120
bottleneck trap, 34 瓶颈陷阱,34
bounded context, 93, 94 边界上下文,93,94
for services in microservices data isolation with, 249 在微服务中,数据隔离的服务,249
microservices and, 245 微服务和,245
in microservices architecture, 247 在微服务架构中,247
granularity for services, 248 服务的粒度,248
user interface as part of, 253 用户界面作为一部分,253
broadcast capabilities in event-driven architecture, 203 事件驱动架构中的广播能力,203
broker topology (event-driven architecture), 180-185 代理拓扑(事件驱动架构),180-185
benefits and disadvantages of, 185 优点和缺点,185
example, 182 例子,182
Brooks’ law, 335 布鲁克斯定律,335
Brooks, Fred, 68, 335, 372 布鲁克斯,弗雷德,68,335,372
Brown, Simon, 102, 318 布朗,西蒙,102,318
bug fixes, working on, 35 错误修复,正在进行中,35
build in animations, 321 内置动画,321
build out animations, 321 构建动画,321
Building Evolutionary Architectures (Ford et al.), 16, 82, 91 构建进化架构 (Ford et al.), 16, 82, 91
Bullet-Riddled Corpse anti-pattern, 322 子弹穿孔尸体反模式,322
business and technical justifications for architecture decisions, 282, 343 架构决策的商业和技术理由,282,343
business delegate pattern, 136 业务代表模式,136
business domains 业务领域
knowledge of, 11 知识,11
in layered architecture, 135 在分层架构中,135
business drivers, understanding, 34 业务驱动因素,理解,34
business layer 业务层
in layered architectures, 133 在分层架构中,133
shared objects in, 136 共享对象在,136
Business Process Execution Language (BPEL), 187 业务流程执行语言 (BPEL), 187
business process management (BPM) engines, 187 业务流程管理 (BPM) 引擎,187
business rules layer, 104 业务规则层,104
business stakeholders, negotiating with, 348 商业利益相关者,谈判,348
C
C’s of architecture, 353 架构的 C,353
C4 diagramming standard, 318 C4 图示标准,318
caching 缓存
data collisions and caches in space-based architecture, 224 基于空间的架构中的数据冲突和缓存,224
data pumps in caches in space-based architecture, 220 空间基础架构中的缓存数据泵,220
named caches and data readers, 222 命名缓存和数据读取器,222
named caches in space-based architecture, 216 空间基础架构中的命名缓存,216
near-cache considerations in space-based architecture, 230 基于空间架构中的近缓存考虑,230
replicated vs. distributed in space-based architecture, 227-230 空间基础架构中的复制与分布,227-230
capabilities, new, and shifting fashion in architecture, 268 能力,新的,以及在架构中的变化趋势,268
capacity, 61 容量,61
career path, developing, 365-372 职业路径,开发,365-372
developing a personal radar, 367-371 开发个人雷达,367-371
parting advice, 372 告别建议,372
self-assessment questions, 381 自我评估问题,381
twenty-minute rule, 365-367 二十分钟规则,365-367
using social media, 371 使用社交媒体,371
chaos engineering, 88 混沌工程,88
Chaos Gorilla, 88 混沌大猩猩,88
Chaos Monkey, 88 混沌猴子,88
The Checklist Manifesto (Gawande), 89, 338 《清单宣言》(Gawande),89,338
checklists, leveraging, 338-343 检查清单,利用,338-343
developer code completion checklist, 340 开发者代码完成检查清单,340
software release checklist, 342 软件发布检查清单,342
unit and functional testing checklist, 341 单元和功能测试清单,341
choreography 编排
of bounded context services in microservices, 248 微服务中的边界上下文服务,248
in microservices’ communication, 256 在微服务的通信中,256
circuit breakers, 124 电路断路器,124
clarity, 355 清晰度,355
classes, representation in C4 diagrams, 319 类,C4 图中的表示,319
classpath (Java), 40
client acknowledge mode, 202 客户端确认模式,202
client/server architectures, 121 客户端/服务器架构,121
browser and web server, 122 浏览器和 web 服务器,122
desktop and database server, 122 桌面和数据库服务器,122
three-tier, 122 三层,122
closed versus open layers, 135 封闭层与开放层,135
cloud, space-based architecture implementations on, 226 云,基于空间的架构实现,226
code reviews by architects, 36 架构师的代码审查,36
coding, balancing with architecture, 34 编码,平衡架构,34
coexistence, 61 共存,61
cohesion, 93 内聚性,93
functional, 92 功能的,92
collaboration, 355 协作,355
of architects with other teams, 74 与其他团队的架构师,74
color in diagrams, 320 图表中的颜色,320
Common Object Request Broker Architecture (CORBA), 122 通用对象请求代理架构 (CORBA), 122
communication, 355 通信,355
communicating architecture decisions effectively, 283 有效地传达架构决策,283
communication between services, 270 服务之间的通信,270
in microservices architecture, 254-263 在微服务架构中,254-263
in microservices implementation of Going, Going, Gone, 276 在微服务实施中,Going, Going, Gone,276
communication connascence, 93 通信共生,93
compatibility 兼容性
defined, 61 定义,61
interoperability versus, 60 互操作性与,60
compensating transaction framework, 262 补偿事务框架,262
competing consumers, 209 竞争消费者,209
competitive advantage, translation to architec- 竞争优势,翻译为架构-
ture characteristics, 67 真实特征,67
complexity in architecture, 353 架构中的复杂性,353
component-based thinking, 99-116 基于组件的思维,99-116
choosing between monolithic and distributed architectures in Going, Going, Gone component design, 115-116 在组件设计中选择单体架构和分布式架构,115-116
component design, 110-112 组件设计,110-112
component identification flow, 108-109 组件识别流程,108-109
component scope, 99-100 组件范围,99-100
developers’ role, 108 开发人员的角色,108
granularity of components, 110 组件的粒度,110
self-assessment questions, 375 自我评估问题,375
software architect’s role, 101-107 软件架构师的角色,101-107
components 组件
defined, 99, 101 定义,99,101
representation in C4 diagrams, 318 C4 图中的表示,318
concert ticketing system example (space-based 演唱会票务系统示例 (基于空间)
architecture), 231 架构), 231
conciseness, 355 简洁性,355
confidentiality, 61 保密性,61
configurability, 59 可配置性,59
Conformity Monkey, 88 合规猴,88
connascence 共生性
about, 92 关于,92
asynchronous, 94 异步, 94
synchronous, in high functional cohesion, 93 同步,高功能内聚,93
connected components, 80 连接组件,80
consistency, eventual, 132 一致性,最终,132
constraints, communication by architect to 约束,架构师的沟通给
development team, 325 开发团队,325
construction techniques, architecurally signifi- 构建技术,架构上重要的
cant decisions impacting, 285 影响决策,285
Consul, 157
consumer filters, 144 消费者过滤器,144
containers in C4 diagramming, 318 C4 图示中的容器,318
context 上下文
architecture katas and, 68 架构练习和,68
bounded context and, 245 边界上下文和,245
bounded context in domain-driven design, 94 领域驱动设计中的界限上下文,94
context section of ADRs, 288 ADRs 的上下文部分,288
indicating in larger diagram using representational consistency, 315 在更大的图中使用表示一致性进行指示,315
representation in C4 diagrams, 318 C4 图中的表示,318
continuity, 58 连续性,58
continuous delivery, 14 持续交付,14
contracts 合同
data pumps in space-based architecture, 220 空间基础架构中的数据泵,220
maintenance and versioning, 132 维护和版本控制,132
in microkernel architecture, 158 在微内核架构中,158
in stamp coupling resolution, 126 在印章耦合解决中,126
control freak architects, 327 控制狂建筑师,327
Conway’s law, 103, 133 康威定律,103,133
orchestration engine and, 238 编排引擎和,238
Cookie-Cutter anti-pattern, 321 Cookie-Cutter 反模式,321
core system in microkernel architecture, 150-153 微内核架构中的核心系统,150-153
correlation ID, 204 关联 ID, 204
cost 成本
justification for architecture decisions, 283 架构决策的理由,283
in orchestration-driven service-oriented architecture, 243 在以编排驱动的服务导向架构中,243
overall cost in layered architectures, 140 分层架构中的整体成本,140
overall cost in microkernel architecture, 160 微内核架构中的整体成本,160
overall cost in pipeline architecture, 147 管道架构中的整体成本,147
overall cost in service-based architecture, 176 基于服务的架构中的整体成本,176
for risk mitigation, 307 用于风险缓解,307
in space-based architecture, 234 在基于空间的架构中,234
transport cost in distributed computing, 130 分布式计算中的运输成本,130
coupling 耦合
and connascence, 92 和共生性,92
decoupling of services in microservices, 247 微服务中服务的解耦,247
negative trade-off of reuse, 246 重用的负面权衡,246
reuse and, in orchestration-driven serviceoriented architecture, 241 重用以及在以编排驱动的服务导向架构中,241
Covering Your Assets anti-pattern, 282 覆盖你的资产反模式,282
Crap4J tool, 81 Crap4J 工具,81
critical or important to success (architecture characteristics), 75 对成功至关重要或重要(架构特征),75
cross-cutting architecture characteristics, 59 跨切架构特性,59
cube or door transitions, 321 立方体或门的过渡,321
customizability, architecture characteristics and, 72 可定制性,架构特征和,72
cyclic dependencies between components, 84-86 组件之间的循环依赖,84-86
cyclomatic complexity 圈复杂度
calculating, 79 计算,79
good value for, 81 物有所值,81
removal from core system of microkernel architecture, 150 从微内核架构的核心系统中移除,150
D
data 数据
deciding where it should live, 270 决定它应该在哪里存在,270
preventing data loss in event-driven architecture, 201-203 在事件驱动架构中防止数据丢失,201-203
software architecture and, 19 软件架构和,19
data abstraction layer, 223 数据抽象层,223
data access layer, 223 数据访问层,223
data collisions, 224-226 数据冲突,224-226
cache size and, 226 缓存大小和,226
formula to calculate probable number of, 224 计算可能数量的公式,224
number of processing unit instances and, 226 处理单元实例的数量和,226
data grid, 215 数据网格,215
data isolation in microservices, 249 微服务中的数据隔离,249
data meshes, 356 数据网格,356
“Data Monolith to Data Mesh” article (Fowler), 356 “数据大一统到数据网格”文章 (Fowler), 356
data pumps, 213, 219 数据泵,213,219
data reader with reverse data pump, 223 带有反向数据泵的数据读取器,223
in domain-based data writers, 221 在基于领域的数据写入器中,221
data readers, 213, 222 数据读取器,213,222
data writers, 213, 221 数据写入器,213,221
database entities, user interface frontend built on, 111 数据库实体,用户界面前端构建在,111
Database Output transformer filter, 146 数据库输出变换器过滤器,146
database server, desktop and, 122 数据库服务器,桌面和,122
databases 数据库
ACID transactions in services of servicebased architecture, 168 服务基础架构中的 ACID 事务,168
component-relational mapping of framework to, 111 框架的组件关系映射,111
data pump sending data to in space-based architecture, 220 数据泵将数据发送到基于空间的架构中,220
licensing of database servers, problems with, 235 数据库服务器的许可问题,235
in microkernel architecture core system, 151 在微内核架构核心系统中,151
in microkernel architecture plug-ins, 156 在微内核架构插件中,156
in orchestration-driven service-oriented architecture, 242 在以编排驱动的服务导向架构中,242
partitioning in service-based architecture, 169-171 基于服务的架构中的分区,169-171
removing as synchronous constraint in space-based architecture, 212 在基于空间的架构中移除同步约束,212
scaling database server, problems with, 211 扩展数据库服务器,问题,211
in service-based architecture, 164 在基于服务的架构中,164
transactions in service-based architecture, 177 基于服务的架构中的事务,177
variants in service-based architecture, 166 基于服务的架构中的变体,166
DDD (see domain-driven design) DDD(参见领域驱动设计)
demonstration defeats discussion, 350 演示胜于讨论,350
dependencies 依赖关系
architecturally significant decisions impacting, 284 影响架构上重要的决策,284
cyclic, modularity and, 84-86 循环性,模块化和,84-86
timing, modules and, 41 时序、模块和,41
deployability 可部署性
low rating in layered architecture, 140 分层架构中的低评分,140
process measures of, 81 过程度量,81
rating in microkernel architecture, 161 微内核架构中的评级,161
rating in orchestration-driven serviceoriented architecture, 242 在以编排驱动的服务导向架构中的评级,242
rating in pipeline architecture, 147 管道架构中的评级,147
rating in service-based architecture, 176 基于服务的架构中的评级,176
deployment 部署
automated deployment in microservices architecture, 263 微服务架构中的自动化部署,263
deployment manager in space-based architecture, 219 基于空间架构中的部署管理器,219
physical topology variants in layered architecture, 134 分层架构中的物理拓扑变体,134
design 设计
architecture versus, 23 架构与,23
versus architecture and trade-offs, 74 与架构和权衡,74
understanding long-term implication of decisions on, 123 理解决策的长期影响,123
design principles in software architecture, 7 软件架构中的设计原则,7
developer code completion checklist, 340 开发者代码完成检查清单,340
developer flow, 362 开发者流程,362
developers 开发人员
drawn to complexity, 354 吸引复杂性,354
negotiating with, 351 谈判,351
role in components, 108 组件中的角色,108
roles in layered architecture, 133 分层架构中的角色,133
development process, separation from software architecture, 14, 101 开发过程,与软件架构的分离,14,101
development teams, making effective, 325-346 开发团队,有效地,325-346
amount of control exerted by sotware architect, 331-335 软件架构师施加的控制量,331-335
leveraging checklists, 338-343 利用检查清单,338-343
developer code completion checklist, 340 开发者代码完成检查清单,340
software release checklist, 342 软件发布检查清单,342
unit and functional testing checklist, 341 单元和功能测试清单,341
self-assessment questions, 381 自我评估问题,381
software architect personality types and, 326-331 软件架构师个性类型和,326-331
software architect providing guidance, 343-346 软件架构师提供指导,343-346
team boundaries, 325 团队边界,325
team warning signs, 335-338 团队警告信号,335-338
DevOps, 3
adoption of extreme programming practices, 15 极限编程实践的采用,15
intersection with software architecture, 17 与软件架构的交集,17
diagramming and presenting architecture, 图示和展示架构,
315-324
diagramming, 316-321 图示,316-321
guidelines for diagrams, 319 图表指南,319
standards, UML, C4, and ArchiMate, 318 标准,UML,C4 和 ArchiMate,318
tools for, 316 工具,316
presenting, 321-324 展示,321-324
incremental builds of presentations, 322 增量构建演示文稿,322
infodecks vs. presentations, 324 infodecks 与演示文稿,324
invisibility, 324 隐形,324
manipulating time with presentation tools, 321 使用演示工具操控时间,321
slides are half of the story, 324 幻灯片只是故事的一半,324
representational consistency, 315 表现一致性,315
self-assessment questions, 380 自我评估问题,380
diffusion of responsibility, 337 责任扩散,337
direction of risk, 300 风险方向,300
directory structure for storing ADRs, 292 存储 ADRs 的目录结构,292
dissolve transitions and animations, 321 溶解过渡和动画,321
distance from the main sequence metric, 86-8886-88 主序距离度量, 86-8886-88
distributed architectures 分布式架构
domain partitioning and, 107 领域分区和,107
in Going, Going, Gone case study, 274-277 在《Going, Going, Gone》案例研究中,274-277
microkernel architecture with remote access plug-ins, 156 微内核架构与远程访问插件,156
microservices, 247 微服务,247
monolithic architectures versus, 123-132, 270 单体架构与,123-132,270
fallacies of distributed computing, 124-131 分布式计算的谬论,124-131
in Going, Going, Gone case study, 115-116 在《Going, Going, Gone》案例研究中,115-116
other distributed computing considerations, 131 其他分布式计算考虑因素,131
orchestration and choreography of services in, 177 服务的编排和协调,177
in service-based architecture style, 175 在基于服务的架构风格中,175
stamp coupling in, 126 印章耦合,126
three-tier architecture and network-level protocols, 122 三层架构和网络级协议,122
Distributed Component Object Model 分布式组件对象模型
(DCOM), 122
distributed systems, 121 分布式系统,121
distributed transactions, 132 分布式事务,132
distributed vs. replicated caching in space- 分布式缓存与复制缓存在空间中的比较
based architecture, 227-230 基于架构,227-230
divide and conquer rule, 350 分而治之法则,350
do and undo operations in transactions, 263 事务中的执行和撤销操作,263
documentation, ADRs as, 293 文档,ADRs 作为,293
domain partitioning (components) 领域分区(组件)
defined, 104 定义,104
in microkernel architecture, 161 在微内核架构中,161
in service-based architecture style, 174 在基于服务的架构风格中,174
in space-based architecture, 234 在基于空间的架构中,234
in Silicon Sandwiches monolithic architectures case study, 271 在硅三明治单体架构案例研究中,271
in Silicon Sandwiches partitioning case study, 107 在硅三明治分区案例研究中,107
technical partitioning versus, 249 技术分区与,249
domain-driven design (DDD), 94 领域驱动设计 (DDD), 94
component partitioning and, 104 组件划分和,104
event storming, 112 事件风暴,112
influence on microservices, 245 对微服务的影响,245
user interface as part of bounded context, 253 用户界面作为限界上下文的一部分,253
domain/architecture isomorphism, 256 领域/架构同构,256
domains 领域
developers defining, 245 开发人员定义,245
domain areas of applications, risk assessment on, 299 应用领域,风险评估,299
domain changes and architecture styles, 268 领域变化和架构风格,268
domain concerns, translating to architecture characteristics, 65-67 领域关注,转化为架构特征,65-67
domain services in service-based architecture, 163, 168 服务导向架构中的领域服务,163,168
domain-based data readers, 223 基于域的数据读取器,223
domain-based data writers, 221 基于域的数据写入器,221
domain-centered architecture in microservices, 265 以领域为中心的微服务架构,265
inspiration for microservices service boundaries, 248 微服务服务边界的灵感,248
in technical and domain-partitioned architectures, 104 在技术和领域分区架构中,104
door or cube transitions, 321 门或立方体转换,321
driving characteristics, focus on, 65 驾驶特性,专注于,65
duplication, favoring over reuse, 246 重复,偏向于重用,246
Duration Calculator transformer filter, 146 持续时间计算器变换器过滤器,146
Duration filter, 146 持续时间过滤器,146
dynamic connascence, 92 动态连接性,92
DZone Refcardz, 366
E
Eclipse IDE, 150, 158
effective architects, 330 有效的架构师,330
effective teams (see development teams, making effective) 有效的团队(见开发团队,打造有效团队)
effects offered by presentation tools, 321 演示工具提供的效果,321
elastic scale, 13 弹性扩展,13
elasticity, 70, 97 弹性,70,97
being pragmatic, yet visionary about, 356 务实而又富有远见,356
in Going, Going, Gone case study, 96 在《Going, Going, Gone》案例研究中,96
low rating in layered architecture, 141 分层架构中的低评分,141
low rating in pipeline architecture, 148 管道架构中的低评分,148
rating in microservices architecture, 264 微服务架构中的评分,264
rating in orchestration-driven serviceoriented architecture, 243 在以编排驱动的服务导向架构中的评级,243
rating in service-based architecture, 176 rating in space-based architecture, 233 risks in nurse diagnostics system example, 312 服务基础架构中的评级,176 空间基础架构中的评级,233 护士诊断系统示例中的风险,312
electronic devices recycling example (servicebased architecture), 172-173 电子设备回收示例(基于服务的架构),172-173
Email-Driven Architecture anti-pattern, 283 电子邮件驱动架构反模式,283
engineering practices, software architecture and, 14 工程实践,软件架构和,14
Enterprise 2.0 (McAfee), 371 企业 2.0 (McAfee), 371
entity objects, shared library in service-based architecture, 170 实体对象,基于服务的架构中的共享库,170
entity trap, 110, 249 实体陷阱,110,249
error handling in event-driven architecture, 197-200 事件驱动架构中的错误处理,197-200
errors (user), protection against, 61 错误(用户),防护,61
essential complexity, 354 基本复杂性,354
Evans, Eric, 94 埃文斯,埃里克,94
event broker, 181 事件代理,181
event mediators, 185 事件中介,185
delegating events to, 187 委托事件到,187
event processor, 181, 186 事件处理器,181,186
event queue, 186 事件队列,186
event storming in component discovery, 112 组件发现中的事件风暴,112
event-driven architecture, 179-209 事件驱动架构,179-209
architecture characteristics ratings, 207-209 架构特征评级,207-209
asynchronous capabilities, 196-197 异步能力,196-197
broadcast capabilities, 203 广播能力,203
choosing between request-based and eventbased model, 206 在请求基础模型和事件基础模型之间进行选择,206
error handling, 197-200 错误处理,197-200
preventing data loss, 201-203 防止数据丢失,201-203
request-reply messaging, 204 请求-回复消息传递,204
self-assessment questions, 377 自我评估问题,377
topology, 180 拓扑,180
broker topology, 180-185 代理拓扑,180-185
mediator topology, 185-195 中介拓扑,185-195
events 事件
commands versus in event-driven architecture, 195 命令与事件驱动架构中的事件,195
use for asynchronous communication in microservices, 255 在微服务中用于异步通信,255
eventual consistency, 132 最终一致性,132
eviction policy in front cache, 230 前缓存中的驱逐策略,230
evolutionary architectures, 83 进化架构,83
event-driven architecture, 209 事件驱动架构,209
microservices, 264 微服务,264
expectations of an architect, 8-13 建筑师的期望,8-13
explicit versus implicit architecture characteristics, 57 显式与隐式架构特征,57
extensibility, 59 可扩展性,59
rating in microkenel architecture, 161 微内核架构中的评级,161
extreme programming (XP), 14, 82 极限编程 (XP), 14, 82
F
fallacies of distributed computing, 124-131 分布式计算的谬误,124-131
bandwidth is infinite, 126 带宽是无限的,126
latency is zero, 125 延迟为零,125
the network is reliable, 124 网络是可靠的,124
the network is secure, 127 网络是安全的,127
the network is homogeneous, 131 网络是同质的,131
the topology never changes, 128 拓扑从未改变,128
there is only one administrator, 129 只有一个管理员,129
transport cost is zero, 130 运输成本为零,130
fast-lane reader pattern, 136 快速通道阅读器模式,136
fault tolerance 容错
layered architecture and, 141 分层架构和,141
microkernel architecture and, 160 微内核架构和,160
pipeline architecture and, 148 管道架构和,148
rating in event-driven architecture, 209 事件驱动架构中的评级,209
rating in microservices arhitecture, 263 微服务架构中的评级,263
rating in service-based architecture, 176 基于服务的架构中的评级,176
reliability and, 61 可靠性和,61
feasibility, 72 可行性,72
federated event broker components, 181 联邦事件代理组件,181
filters 过滤器
in pipeline architecture, 143 在管道架构中,143
types of, 144 类型的,144
in pipeline architecture example, 146 在管道架构示例中,146
First Law of Software Architecture, 19 软件架构第一法则,19
fitness functions, 17, 36, 83-89 适应度函数,17,36,83-89
flame effects, 321 火焰效果,321
flow, state of, 362 流,状态,362
Foote, Brian, 120
Ford, Neal, 30, 321, 354, 372 福特,尼尔,30,321,354,372
Fowler, Martin, 1, 245, 248, 356 福勒,马丁,1,245,248,356
front cache, 230 前缓存,230
front controller pattern, 258 前端控制器模式,258
frontends 前端
Backends for Frontends (BFF) pattern, 273 前端后端(BFF)模式,273
in microservices architecture, 253-254 在微服务架构中,253-254
Frozen Caveman anti-pattern, 30 冰冻穴居人反模式,30
full backing cache, 230 全备份缓存,230
functional aspects of software, 62 软件的功能方面,62
functional cohesion, 92 high, 93 功能内聚,92 高,93
functions or methods, formula for calculating cyclomatic complexity, 79 函数或方法,计算圈复杂度的公式,79
G
Gawande, Atul, 89, 338
Generic Architecture (anti-pattern), 65 通用架构(反模式),65
Going, Going, Gone case study, 95-98 Going, Going, Gone 案例研究,95-98
using distributed architecture, 274-277 使用分布式架构,274-277
Going, Going, Gone: discovering components case study, 112-115 正在进行,正在进行,已结束:发现组件案例研究,112-115
governance for architecture characteristics, 82 架构特征的治理,82
granularity 粒度
architectural quanta and, 92-98 建筑量和,92-98
for services in microservices, 248 在微服务中的服务,248
Groundhog Day anti-pattern, 282 土拨鼠日反模式,282
group potential, 335 群体潜力,335
web applications, scaling to meet increased loads, 211 Web 应用程序,扩展以满足增加的负载,211
web browsers, using microkernel architecture, 158 网页浏览器,使用微内核架构,158
web servers 网络服务器
and browser architecture, 122 和浏览器架构,122
scaling, problems with, 211 扩展,问题,211
Weinberg, Gerald, 357 温伯格,杰拉尔德,357
What Every Programmer Should Know About 每个程序员应该知道的事情
Object Oriented Design (Page-Jones), 92 面向对象设计 (Page-Jones), 92
Why is more important than How, 19 为什么比怎么更重要,19
Wildfly application server, 161 Wildfly 应用服务器,161
workflows 工作流
database relationships incorrectly identified as, 111 数据库关系错误识别为,111
in technical and domain-partitioned architecture, 104 在技术和领域分区架构中,104
workflow approach to designing components, 112 工作流方法设计组件,112
workflow event pattern, 197-200 工作流事件模式,197-200
workflow delegate, 197 工作流委托,197
workflow processor, 198 工作流处理器,198
Y
Yoder, Joseph, 120 约德,约瑟夫,120
About the Authors 关于作者
Mark Richards is an experienced hands-on software architect involved in the architecture, design, and implementation of microservices and other distributed architectures. He is the founder of DeveloperToArchitect.com, a website devoted to assisting developers in the journey from developer to a software architect. Mark Richards 是一位经验丰富的实践型软件架构师,参与微服务和其他分布式架构的架构、设计和实施。他是 DeveloperToArchitect.com 的创始人,该网站致力于帮助开发人员从开发者转变为软件架构师。
Neal Ford is director, software architect, and meme wrangler at ThoughtWorks, a global IT consultancy with an exclusive focus on end-to-end software development and delivery. Before joining ThoughtWorks, Neal was the chief technology officer at The DSW Group, Ltd., a nationally recognized training and development firm. Neal Ford 是 ThoughtWorks 的董事、软件架构师和 meme 管理员,ThoughtWorks 是一家全球性的 IT 咨询公司,专注于端到端的软件开发和交付。在加入 ThoughtWorks 之前,Neal 是 The DSW Group, Ltd. 的首席技术官,该公司是一家全国知名的培训和发展公司。
Colophon 书籍说明
The animal on the cover of Fundamentals of Software Engineering is the red-fan parrot (Deroptyus accipitrinus), a native to South America where it is known by several names such as loro cacique in Spanish, or anacã, papagaio-de-coleira, and vanaquiá in Portugese. This New World bird makes its home up in the canopies and tree holes of the Amazon rainforest, where it feeds on the fruits of the Cecropia tree, aptly known as “snake fingers,” as well as the hard fruits of various palm trees. 《软件工程基础》封面上的动物是红扇鹦鹉(Deroptyus accipitrinus),它是南美的特有种,在西班牙语中被称为 loro cacique,在葡萄牙语中则有 anacã、papagaio-de-coleira 和 vanaquiá等多个名称。这种新世界鸟类栖息在亚马逊雨林的树冠和树洞中,以“蛇指”著称的塞克罗皮亚树的果实以及各种棕榈树的硬果为食。
As the only member of the genus Deroptyus, the red-fan parrot is distinguished by the deep red feathers that cover its nape. Its name comes from the fact that those feathers will “fan” out when it feels excited or threatened and reveal the brilliant blue that highlights each tip. The head is topped by a white crown and yellow eyes, with brown cheeks that are streaked in white. The parrot’s breast and belly are covered in the same red feathers dipped in blue, in contrast with the layered bright green feathers on its back. 作为唯一的德罗普图斯属成员,红扇鹦鹉以覆盖其颈部的深红色羽毛而闻名。它的名字来源于当它感到兴奋或受到威胁时,这些羽毛会“扇”开,露出每个羽毛尖端的亮蓝色。头部顶有白色的冠,黄色的眼睛,脸颊是棕色的,带有白色的条纹。鹦鹉的胸部和腹部覆盖着同样的红色羽毛,浸染着蓝色,与其背部层叠的亮绿色羽毛形成对比。
Between December and January, the red-fan parrot will find its lifelong mate and then begin laying 2-4 eggs a year. During the 28 days in which the female is incubating the eggs, the male will provide her with care and support. After about 10 weeks, the young are ready to start fledging in the wild and begin their 40-year life span in the world’s largest tropical rainforest. 在十二月和一月之间,红扇鹦鹉将找到其终生伴侣,然后开始每年产 2-4 个蛋。在雌鸟孵化蛋的 28 天里,雄鸟将为她提供照顾和支持。大约 10 周后,幼鸟准备在野外开始展翅,并在世界上最大的热带雨林中开始它们 40 年的寿命。
While the red-fan parrot’s current conservation status is designated as of Least Concern, many of the animals on O’Reilly covers are endangered; all of them are important to the world. 虽然红扇鹦鹉目前的保护状态被指定为“无危”,但 O'Reilly 封面上的许多动物都是濒危的;它们对世界都很重要。
The cover illustration is by Karen Montgomery, based on a black and white engraving from Lydekker’s Royal Natural History. The cover fonts are Gilroy Semibold and Guardian Sans. The text font is Adobe Minion Pro; the heading font is Adobe Myriad Condensed; and the code font is Dalton Maag’s Ubuntu Mono. 封面插图由 Karen Montgomery 创作,基于 Lydekker 的《皇家自然历史》中的黑白雕刻。封面字体为 Gilroy Semibold 和 Guardian Sans。正文字体为 Adobe Minion Pro;标题字体为 Adobe Myriad Condensed;代码字体为 Dalton Maag 的 Ubuntu Mono。
O'REILLY
There's much more where this came from. 这里还有更多内容。
Experience books, videos, live online training courses, and more from O’Reilly and our 200+ partners-all in one place. 来自 O'Reilly 及我们 200 多个合作伙伴的经验书籍、视频、在线直播培训课程等,尽在一处。