Technology Quarterly | Picture this

Artificial intelligence has long been improving diagnoses

But recently the field has exploded

3D rendered illustration of a pink stethoscope with circuitry texture on a blue background
Illustration: Timo Lenzen
Listen to this story.
Enjoy more audio and podcasts on iOS or Android.
收听本故事。在 iOS 或 Android 上欣赏更多音频和播客。

BARBARA had a routine mammogram in January 2023. A few weeks later she was asked to visit her doctor in the Aberdeen Royal Infirmary, in Scotland. The mammogram had looked fine to two doctors, but an AI system called Mia had seen something amiss: a six-millimetre patch of subtly-off shades of grey. It was stage 2 cancer. Had it not been spotted at that point and removed, it would not have been caught until Barbara came in for her next routine screen—or until it made its presence known in some other way.
2023 年 1 月,芭芭拉做了一次常规乳房 X 光检查。几周后,她被要求去苏格兰阿伯丁皇家医院看医生。在两位医生看来,乳房 X 光检查结果并无问题,但一个名为 "米娅"(Mia)的人工智能系统却发现了一些不对劲的地方:一块六毫米见方的灰色阴影微微偏离。这是癌症二期。如果当时没有被发现并切除,那么直到芭芭拉进行下一次例行检查时,或者直到它以其他方式出现时,才会被发现。

If such tales give a visceral sense of AI’s ability to improve diagnosis, statistics show the scale of the good it could attain. The British government says that analysis of brain scans by e-Stroke, a system developed by Brainomix, a startup spun out of Oxford University, has reduced the time between hospital admission and treatment for stroke patients by more than an hour. It points to as-yet-unpublished data saying that the system’s speed has tripled the number of people achieving functional independence after a stroke from 16% to 48%.
如果说这些故事让人直观地感受到人工智能改善诊断的能力,那么统计数据则显示了它所能带来的巨大好处。英国政府称,由牛津大学衍生出的初创公司 Brainomix 开发的电子中风(e-Stroke)系统对脑部扫描进行分析后,中风患者从入院到接受治疗的时间缩短了一个多小时。牛津大学指出,尚未公布的数据显示,该系统的速度使中风后实现功能独立的人数增加了两倍,从 16% 增加到 48%。

Artificial intelligence has been applied to diagnosis for longer than to any other part of health care, and it shows. But the transformation it offers is far from complete. The AI systems used so far have often been what seem now like quite simple uses of pattern recognition. The foundation models which have so wowed the world since the advent of ChatGPT in 2022 have barely begun to make their mark.

The revolution began in radiology, the first sort of medical imaging to go fully digital. The transition made storing and sharing images easier; it also produced images which could be read by machines. In 2012, when a neural network called AlexNet beat all comers in the annual “ImageNet challenge”, the machines started to come into their own.
这场革命始于放射学,这是第一种完全数字化的医学影像。这一转变使图像的存储和共享变得更加容易,同时也产生了可由机器读取的图像。2012 年,一个名为 AlexNet 的神经网络在年度 "ImageNet 挑战赛 "中击败了所有对手,机器开始崭露头角。

Neural networks, inspired by the structure of the brain’s visual cortex, are systems in which information flows through layers of “neurons” stacked one on top of the other. In early neural networks all the neurons in one layer were connected to all the neurons in the next. AlexNet was a “convolutional” neural network—one in which the connections were more sparse, something that allows more discriminate forms of analysis. Combining that architecture with new processors of what was then prodigious power allowed AlexNet to revolutionise the science of computer vision, and with it the potential of automated radiology and, later on, of dermatology, ophthalmology and more.
神经网络受大脑视觉皮层结构的启发,是信息通过层层叠加的 "神经元 "流动的系统。在早期的神经网络中,一层的所有神经元都与下一层的所有神经元相连。AlexNet 是一种 "卷积 "神经网络,其中的连接更加稀疏,可以进行更多形式的辨别分析。将这种架构与当时功能强大的新型处理器相结合,AlexNet 为计算机视觉科学带来了革命性的变化,同时也为自动化放射学以及后来的皮肤病学、眼科学等领域带来了巨大的发展潜力。

A sight worth seeing 值得一看的风景

AlexNet’s descendants are increasingly being used to complement, and sometimes replace, human radiologists. Capio Saint Göran Hospital in Stockholm, Sweden, for example, uses an AI system from Lunit, a South Korean company, as the “second pair of eyes” in its radiography department, instead of having mammograms looked at independently by two radiographers. In Denmark Transpara, a product provided by ScreenPoint Medical, a Dutch company, is used as a first reader of mammograms in low-risk cases.
AlexNet 的后代正越来越多地被用来补充,有时甚至取代人类放射科医生。例如,瑞典斯德哥尔摩的 Capio Saint Göran 医院使用韩国 Lunit 公司的一套人工智能系统作为放射科的 "第二双眼睛",而不是由两名放射技师独立检查乳房 X 光照片。在丹麦,荷兰 ScreenPoint Medical 公司的产品 Transpara 被用作低风险病例乳房 X 光照片的第一阅读器。

Being able to do more diagnosis with fewer doctors will be helpful everywhere, but it promises to be a godsend in poor countries. Fujifilm, a Japanese company, has built a 3.5kg, battery-powered x-ray machine which, paired with AI algorithms from, an Indian firm, is being used to screen for tuberculosis in rural Nigeria. It can also assess a host of other diseases including pneumonia, chronic obstructive pulmonary disease (COPD) and heart failure. More ambitiously, Darlington Akogo of MinoHealth Labs in Ghana is building a radiology model trained on images from across Africa. Is it too ambitious to expect a diagnostic tool from this process? “Let’s say we are aiming for the stars,” says Dr Akogo. “Even if we miss it what we end up with is radiology assistance.”
用更少的医生做更多的诊断在任何地方都会有所帮助,但在贫穷国家,这将是一个天赐良机。日本富士胶片(Fujifilm)公司制造了一台重 3.5 千克、由电池供电的 X 光机,与印度 公司的人工智能算法搭配使用,可用于尼日利亚农村地区的肺结核筛查。它还能评估一系列其他疾病,包括肺炎、慢性阻塞性肺病(copd)和心力衰竭。更雄心勃勃的是,加纳 MinoHealth 实验室的达林顿-阿科戈(Darlington Akogo)正在建立一个根据非洲各地图像训练的放射学模型。期望从这一过程中获得诊断工具是否过于雄心勃勃?"阿科戈博士说:"假设我们的目标是星星。阿科戈博士说:"即使我们错过了,我们最终得到的也是放射学辅助工具"。

Some AI systems can interpret images made with less radiation than normal, thus reducing not just the number of doctors needed to interpret an X-ray but also the dose needed to make it, which is good for patients. They can also look for things doctors would not check for. In “opportunistic screening” an X-ray taken to look for a specific problem is scanned for signs of other trouble as well. Most of the 80m CT scans done in America each year are undertaken to look for a problem in some specific part of the body, but they almost always contain information about other parts, too. Doctors have no interest in passing images taken to look for one thing around their colleagues on the off chance it reveals something else. Machines have no problem multitasking, and can become experts in identifying many different types of disease.
一些人工智能系统可以解读辐射量低于正常水平的图像,从而不仅减少了解读 X 光片所需的医生人数,也减少了制作 X 光片所需的剂量,这对病人来说是件好事。它们还能寻找医生不会检查的东西。在 "机会性筛查 "中,为寻找特定问题而拍摄的 X 光片也会被扫描以寻找其他问题的迹象。在美国每年进行的 8000 万次 CT 扫描中,大部分都是为了查找身体某些特定部位的问题,但它们几乎总是包含其他部位的信息。医生没有兴趣把为检查某部位而拍摄的图像传给同事,以防发现其他问题。而机器则可以同时处理多项任务,成为识别多种不同类型疾病的专家。

Ultrasound systems provide another opportunity for AI. Butterfly, an American company, produces a hand-held ultrasound system which, thanks to built-in AI, can be used to assess high-risk pregnancies and to estimate due dates, fetal weights and the amount of amniotic fluid. Such measurements are not otherwise possible outside a clinic, and they normally require a range of instruments. The Bill & Melinda Gates Foundation sees Butterfly’s scanners as a way of bringing down stubbornly high maternal mortality in sub-Saharan Africa. Such AI-enhanced systems—Philips and GE Healthcare are also in the market—have contributions to make beyond maternal care, for example in cardiology, emergency medicine and orthopaedics. Hundreds of Butterfly’s systems are being used in Ukraine to help first responders assess the wounds of war.
超声波系统为人工智能提供了另一个机会。美国的 Butterfly 公司生产了一种手持式超声波系统,由于内置了人工智能,该系统可用于评估高危妊娠,估计预产期、胎儿体重和羊水量。否则,在诊所外无法进行此类测量,而且通常需要一系列仪器。比尔及梅林达-盖茨基金会认为,"蝴蝶 "扫描仪是降低撒哈拉以南非洲地区居高不下的孕产妇死亡率的一种方法。除孕产妇护理外,这种人工智能增强型系统(市场上还有飞利浦和 ge Healthcare)还可在心脏病学、急诊医学和矫形外科等领域做出贡献。在乌克兰,数百套 Butterfly 系统正被用于帮助急救人员评估战争创伤。

Other instruments are also getting an AI make-over. Doctors in primary care in London are evaluating an AI-enabled stethoscope to see if it can improve the diagnosis of some sorts of heart disease. Trials in Oxford are comparing measurements of lung function made using an AI-driven spirometer with previous techniques for picking up COPD.

Jonathan Rothberg, the scientist, engineer and entrepreneur who founded Butterfly, is also one of the founders of Hyperfine, the maker of an innovative portable magnetic-resonance-imaging (MRI) machine called Swoop. Its AI capabilities allow it to assess what is going on using data gathered with the use of comparatively weak magnetic fields. Because low fields are easier to generate, Swoop can be taken to the patient’s bedside, rather than having to sit in a room of its own like high-field MRI machines.
乔纳森-罗斯伯格(Jonathan Rothberg)是一位科学家、工程师和企业家,他创立了蝴蝶公司(Butterfly),同时也是超细公司(Hyperfine)的创始人之一,后者是创新型便携式磁共振成像(mri)机 "Swoop "的制造商。它的 AI 功能可以利用相对较弱的磁场收集的数据来评估病情。由于低磁场更容易产生,Swoop 可以被带到病人的床边,而不必像高磁场 mri 机那样单独放在一个房间里。

At the other end of the scale EZRA, a firm based in New York, is using AI to drive down the cost of full-body MRI as a cancer-screening tool. Using high-field magnets and proprietary AI it has made scans quicker and thus cheaper; it offers a 30-minute scan for $1,350 and is aiming to bring the cost down to $500. A plain language AI-produced account of what has been found is part of the service.
在另一端,纽约的一家公司 ezra 正在利用人工智能降低作为癌症筛查工具的全身核磁共振成像的成本。该公司利用高磁场磁铁和专有的人工智能技术,使扫描速度更快,因而成本更低;30 分钟的扫描价格为 1,350 美元,其目标是将成本降至 500 美元。这项服务的一部分是由人工智能制作的通俗易懂的说明。

One of the advantages of AI systems is that they can be trained on far more data than a medical student can. Microsoft is collaborating with Paige, a firm that develops AI for pathologists, to build an image-based AI tool for diagnosing cancer that will be fed billions of images; a pathologist looking at one slide a second for a hundred lifetimes would not amass the same amount of experience.
人工智能系统的优势之一在于,它们可以在比医学生多得多的数据上接受训练。微软正在与一家为病理学家开发人工智能的公司 Paige 合作,建立一个基于图像的人工智能工具,用于诊断癌症,该工具将输入数十亿张图像;病理学家每秒钟看一张幻灯片,看上一百辈子也积累不了同样多的经验。

As a paediatric neurologist, Sharief Taraman says he can expect to see thousands of children; but the AI which his Silicon Valley based company, Cognoa, has built to assess children for autism has been trained on footage of hundreds of thousands. As a result it can use video uploaded by parents, along with a questionnaire, to reach an assessment of their condition.
作为一名儿科神经学家,沙里夫-塔拉曼(Sharief Taraman)说,他可能要为成千上万的儿童看病;但他在硅谷的公司 Cognoa 建立的用于评估儿童自闭症的人工智能,已经在成千上万的录像中接受过训练。因此,它可以使用家长上传的视频和调查问卷来评估儿童的状况。

It is not important simply to be able to reach an assessment; getting the assessment right matters too. With AI there is the opportunity to match or exceed human performance. For example, AIs seem likely to be able to exceed the ability of human pathologists to “grade” prostate cancers as to whether they are benign or malignant. But showing that a system is good enough takes time, and at the moment algorithms are being generated faster than it is possible to test and regulate them. Hugh Harvey, the boss of Hardian Health, a British firm that assesses medical devices, says that it currently takes at least two years for a medical device to get regulatory approval from scratch.
仅仅达到评估要求并不重要,正确的评估也很重要。有了人工智能,就有机会达到或超过人类的水平。例如,人工智能似乎有可能超越人类病理学家对前列腺癌进行良性或恶性 "分级 "的能力。但是,证明一个系统足够好需要时间,而目前算法产生的速度比测试和规范它们的速度还要快。英国一家评估医疗设备的公司 Hardian Health 的老板休-哈维(Hugh Harvey)说,目前一种医疗设备从无到有获得监管部门的批准至少需要两年时间。

A sight worth seeing, a vision of you

Looking at the British government’s plans to speed up the use of AIs in the diagnosis of lung cancer, David Baldwin, an honorary professor of medicine at Nottingham University, points out that two recent assessments could not confirm the accuracy and clinical impact of the tools being touted. “This is an example of the pace of development being faster than that of evaluation, and there is much work needed to ensure safe deployment,” he says.
英国诺丁汉大学医学名誉教授大卫-鲍德温(David Baldwin)在谈到英国政府计划加快在肺癌诊断中使用 ais 时指出,最近的两项评估无法证实所宣传的工具的准确性和临床影响。"他说:"这是开发速度快于评估速度的一个例子,要确保安全部署还需要做很多工作。

In 2019 a systematic review of the diagnostic accuracy of 82 algorithms in medical imaging found that the ways in which they had been assessed were often sub par. A particular worry was a lack of “prospective” trials which look at outcomes after an intervention, as opposed to retrospective trials which begin with outcomes and go back to look at what went before.
2019 年,一项针对 82 种医学成像算法诊断准确性的系统性研究发现,对这些算法进行评估的方式往往不尽如人意。尤其令人担忧的是,缺乏 "前瞻性 "试验,而回顾性试验则是从结果入手,回顾之前的情况。

One of the reasons this matters is that prospective trials are better at picking up on “false positives”—cases where a system said something was wrong but it wasn’t. Gerald Lip, a consultant radiologist at NHS Grampian in Scotland, has found that some algorithms, like Mia, which are as good or better than humans at finding breast cancers still do this at the expense of more false positives, in part because they work off images alone whereas doctors have other sources of information. False positives are a problem for patients because they lead to worry and potentially painful, even dangerous, follow ups. They are a problem for health systems because they drive up costs.
这很重要,原因之一是前瞻性试验更善于发现 "假阳性 "病例--即系统认为有问题但其实没有的病例。杰拉尔德-利普(Gerald Lip)是苏格兰格兰披恩国家卫生院的放射科顾问,他发现一些算法,如 "米娅"(Mia),在发现乳腺癌方面与人类不相上下,甚至更好,但仍以更多的假阳性为代价,部分原因是这些算法仅根据图像工作,而医生还有其他信息来源。假阳性对病人来说是个问题,因为它们会让病人担心,并导致潜在的痛苦,甚至是危险的复诊。对医疗系统来说,假阳性结果也是一个问题,因为它们会增加成本。

If AI is to lead to an increase in “opportunistic screening” for other things when an image is made for a specific purpose then false positives will need to be particularly low. Indeed the same is true for all approaches which screen those who are symptom free. When Eric Topol, director of the Scripps Research Translational Institute, in San Diego looks at systems like EZRA, which does full-body scans on people who are normally healthy, he worries about the potential for incidental findings and “rabbit-hole extensive workups with risk and cost” only to discover that there is no cancer. Daniel Sodickson, EZRA’s chief scientific adviser, says the proper response to incidental findings is follow-up imaging to see if anything is changing. There will have to be a lot of good evidence for that doubling-down approach if sceptics like Dr Topol are to be convinced.
如果在为特定目的制作图像时,AI 会导致 "机会性筛查 "的增加,那么假阳性率就必须特别低。事实上,所有筛查无症状者的方法都是如此。当圣迭戈斯克里普斯转化研究所所长埃里克-托波尔(Eric Topol)研究像 ezra 这样对正常健康人进行全身扫描的系统时,他担心可能会有意外发现,"兔子洞式的大范围检查会带来风险和成本",到头来却发现没有癌症。ezra 的首席科学顾问丹尼尔-索迪克森(Daniel Sodickson)说,对偶然发现的正确反应是进行后续成像,以确定是否有任何变化。如果要说服像托波尔博士这样的怀疑论者,就必须有大量确凿的证据支持这种加倍减少的方法。

The situation seems to be improving. As AI gets more mainstream those paying for its use are seeking reliable data to decide what is worth their while. Good prospective studies take time, so it is unsurprising that there are not so many of them early on. Other problems seen in the 2019 study—there were some tests which used the data the system was trained on, rather than data it had not seen before—should become less common as the field matures. Unsurprisingly there are entrepreneurs chafing at assessment processes that cost them and their patients time. Dr Taraman worries that hesitancy about the wider use of tests that offer early diagnosis of autism has clear costs: kids are “missing a window of opportunity and are going to have lifelong consequences.”
情况似乎正在好转。随着人工智能逐渐成为主流,那些花钱使用人工智能的人正在寻求可靠的数据,以决定什么值得他们使用。好的前瞻性研究需要时间,因此早期研究不多也就不足为奇了。在 2019 年的研究中还发现了其他一些问题--有些测试使用的是系统训练过的数据,而不是系统以前从未见过的数据--随着该领域的成熟,这些问题应该会越来越少。不难理解,一些企业家对花费他们和病人时间的评估流程感到不满。塔拉曼博士担心,对更广泛地使用自闭症早期诊断测试的犹豫不决会带来明显的代价:孩子们 "错过了一扇机会之窗,并将带来终生的后果"。

A total portrait, with no omissions

A new generation of foundation models that are trained on an assortment of data sources, not just images and vast bodies of text, looks likely to expand the toolbox further. These models do not require the huge quantities of data on which they are trained to be labelled. They have a capacity for “self-supervised” learning, and this can be applied to images, genomic data, gene-expression data, metabolic data, electronic health records, blood tests, and questionnaires on lifestyle and family history.
新一代的基础模型是在各种数据源(而不仅仅是图像和大量文本)上训练出来的,看起来有可能进一步扩展工具箱。这些模型在训练时不需要对大量数据进行标注。它们具有 "自我监督 "学习的能力,可应用于图像、基因组数据、基因表达数据、代谢数据、电子健康记录、血液检测以及有关生活方式和家族史的问卷调查。

Image: The Economist 图片:经济学家

Foundation models should not just provide the wherewithal for better diagnoses of conditions already present. They could also provide better early warning of conditions yet to come, such as cancer, heart disease or diabetes (see chart). In 2022 Chinese researchers showed that this sort of model could predict the risk of severe disease in covid patients. That said, this sort of application of AI needs particular care and attention to make sure the models do not introduce or amplify bias.
基础模型不应仅仅为更好地诊断已经存在的疾病提供手段。它们还能为癌症、心脏病或糖尿病等即将发生的疾病提供更好的预警(见图)。2022 年,中国研究人员的研究表明,这种模型可以预测膀胱癌患者罹患严重疾病的风险。尽管如此,这种ai的应用需要特别小心谨慎,以确保模型不会引入或放大偏差。

This new technology has barely begun to permeate medicine. In 2023 a paper in Nature put that down to the recency of this development and the fact that, while copious text and video are available on the internet (especially if you are not too fussy about copyright) accessing large, diverse sets of medical data is hard. This offers an advantage to companies with big resources; hence the excitement about Microsoft’s collaboration with Paige on a cancer-diagnosis model.

Researchers at Moorfields Eye Hospital in London have been applying AI to ophthalmology since 2016. In September last year Pearse Keane and colleagues at Moorfields and University College London published a foundation model for retinal images produced with Google DeepMind. RETFound, which was pre-trained on over a million images before being shown labelled images of conditions such as diabetic retinopathy and glaucoma, can match the performance of experts when making decisions to refer patients for a number of eye diseases. By picking up tiny changes in the eye’s blood vessels it also appears to predict health conditions such as Parkinson’s disease and stroke. Dr Keane says the technology should be widely available on an open-source basis within two or three years.
伦敦 Moorfields 眼科医院的研究人员自 2016 年起就开始将人工智能应用于眼科领域。去年 9 月,Moorfields 和伦敦大学学院的皮尔斯-基恩(Pearse Keane)及其同事发布了一个利用谷歌 DeepMind 制作的视网膜图像基础模型。retFound 在显示糖尿病视网膜病变和青光眼等疾病的标签图像之前,已对 100 多万张图像进行了预训练,在决定转诊病人治疗一些眼科疾病时,它的表现可以与专家相媲美。通过捕捉眼部血管的微小变化,它似乎还能预测帕金森病和中风等健康状况。基恩博士说,这项技术将在两三年内以开放源代码的方式广泛应用。■

Explore more 了解更多

This article appeared in the Technology Quarterly section of the print edition under the headline "Picture this"
本文以 "请看图片 "为题刊登在《技术季刊》印刷版上。

The AI doctor will see you…eventually

From the March 30th 2024 edition
摘自 2024 年 3 月 30 日版

Discover stories from this section and more in the list of contents

Explore the edition 探索版本