This paper addresses a relevant problem in Forensic Sciences by integrating radiological techniques with advanced machine learning methodologies to create a non-invasive, efficient, and less examinerdependent approach to age estimation. Our study includes a new dataset of 12,827 dental panoramic X-ray imagesrepresenting the Brazilian population, covering an age range from 2.25 to 96.50 years. To analyze these exams, we employed a model adapted from InceptionV4, enhanced with data augmentation techniques. The proposed approachachieved robust and reliable results, with a Test Mean Absolute Error of 3.1 years and an R-squared value of 95.5%95.5 \%. Professional radiologists have validated that our model focuses on criticalfeatures for age assessment used in odontology, such as pulp chamber dimensions and stages of permanent teeth calcification. Importantly, the model also relies on anatomical information from the mandible, maxillary sinus, and vertebrae, which enables it to perform well even in edentulous cases. This study demonstrates the significantpotential of machine learning to revolutionize age estimation in Forensic Science, offering a more accurate, efficient, and universally applicable solution.
Keywords Forensic sciences, Age estimation, Deep neural network, Radiological methods
The task of age estimation plays a pivotal role in forensic sciences and civil investigations, aiding in the reconstruction of biological profiles for missing-person cases, confirming the age of younger criminals, and assisting in situations where personal documents are unavailable. This process traditionally relies on morphological, biochemical, and radiological methods, with radiological approaches, particularly panoramic radiography, emerging as the preferred method due to its non-invasiveness, simplicity, and cost-effectiveness. Panoramic radiography allows for assessing dental development stages across all teeth simultaneously, offering a vital tool for age estimation ^(1-15){ }^{1-15}.
However, despiteadvances in radiological techniques, age estimation posessignificantchallenges, particularly in older individuals. Once dental development is completed, typically by the age of 24 with the closure of the third molar’s apex, traditional manual and visual assessment methods become less effective, creating a gap in accurately determining age in later stages of life ^(2,4){ }^{2,4}. Methods like the pulp/tooth area ratio calculation have been explored to address aging in older individuals, focusing on the deposition of secondary dentin ^(16-18){ }^{16-18}. However, these methods also face limitations, including introducing bias from examiner subjectivity and decreased effectiveness after age 24^(19,20)24^{19,20}. Additionally, the formation of reparative (tertiary) dentin, produced by odontoblasts as a defense mechanism against caries progression, further complicates age assessment. This process can appear radiographically similar to normal dental aging, leading to potential confusion. Such similarities are especially problematic when rehabilitated teeth are included in the sample, as they can obscure accurate age determination and introduce additional subjectivity ^(21,22){ }^{21,22}.
The integration of Artificial Intelligence (AI) technology, particularly Deep Neural Networks (DNNs), offers a promisingsolution to overcome the limitations of traditional age estimation methods. Recent studies have explored the potential of neural networks in automating the evaluation of dental development stages, demonstrating comparable accuracy to human observers and suggesting an avenue for enhancing chronological age detection^(23-34){ }^{23-34}. By developing AI-powered solutions that analyze full panoramic radiograph images without the need for prior manual evaluation, it is possible to achieve faster, more efficient, and more accurate age estimation. 人工智能(AI)技术,特别是深度神经网络(DNNs)的集成,为克服传统年龄估计方法的局限性提供了一种很有前景的解决方案。最近的研究探索了神经网络在自动化评估牙齿发育阶段方面的潜力,其准确性与人工观察者相当,并为增强年龄检测提供了一种途径 ^(23-34){ }^{23-34} 。通过开发分析全景 X 光图像的 AI 驱动解决方案,无需事先人工评估,可以实现更快、更高效、更准确的年龄估计。
This approach not only alleviates the reliance on specialist manual evaluations but also addresses the challenge of estimating age in older individuals, marking a significantadvancement in forensic dentistry and anthropology. 这种方法不仅减轻了对专业人工评估的依赖,而且解决了估计老年人年龄的挑战,标志着法医牙科学和人类学领域的重大进步。
While traditional age estimation methods have provided valuable insights, their limitations in assessing older individuals highlight the need for innovative solutions. The application of machine learning models in dental radiography represents a transformative step forward, offering a more reliable and efficient means of age estimation that can more effectively support forensic and civil investigations. 虽然传统的年龄估计方法提供了宝贵的见解,但它们在评估老年人方面的局限性凸显了对创新解决方案的需求。机器学习模型在牙科影像学中的应用代表着向前迈出的变革性一步,提供了一种更可靠、更高效的年龄估计方法,可以更有效地支持法医和民事调查。
We propose a structured exploration of deep learning age estimation using panoramic dental radiographs in order to build a reliable workflow, depicted in Fig. 1. Initially, we delve into the Results section, which presents the empirical outcomes of applying our InceptionV4-based approach to a unique Brazilian subject dataset, covering age ranges from juveniles to adults. Building upon these results, the Discussion offers a deeper analysis, situating our findings within the broadercontext of forensic science and benchmarking against expert evaluations, thereby illustrating our research’s real-world applicability and implications. The Experiments section details the methodologies employed, from data collection and preprocessing to the intricacies of model training and validation. The Related Work section provides a literaturereview, highlightingprevious studies, positioning our research within the existing works, and highlighting the potential of our contribution to forensic dentistry and future advancements. 我们提出了一种使用全景牙片进行深度学习年龄估计的结构化探索方法,以构建可靠的工作流程,如图 1 所示。首先,我们深入研究结果部分,该部分介绍了将基于 InceptionV4 的方法应用于独特的巴西受试者数据集(涵盖从青少年到成人的年龄范围)的实证结果。在此基础上,讨论部分进行了更深入的分析,将我们的发现置于法医学的更广泛背景下,并与专家评估进行基准测试,从而说明了我们研究的实际应用和意义。实验部分详细介绍了所采用的方法,从数据收集和预处理到模型训练和验证的细微之处。相关工作部分提供了文献综述,重点介绍了以往的研究,将我们的研究置于现有工作中,并强调了我们的贡献对法医牙科和未来发展的潜力。
Results 结果
Our baseline experiment was developed with an adaptation of the InceptionV4 Network with no data augmentation process. The training procedure used 80%80 \% of the subjects for training and an additional10%10 \% subjects as validation. We achieved a Validation Mean Absolute Error of 3.83+-0.2243.83 \pm 0.224 and a Mean Squared Error of 27.83+-27.83 \pm 0.326 , indicating that the architecture could learn from the exams and predict chronological age after the training process. 我们的基线实验使用了改进的 InceptionV4 网络,未进行数据增强。训练过程使用了 80%80 \% 个受试者的数据进行训练,另外 10%10 \% 个受试者作为验证集。我们获得了 3.83+-0.2243.83 \pm 0.224 的验证平均绝对误差和 27.83+-27.83 \pm 0.326 的均方误差,表明该架构能够从考试中学习并在训练后预测年龄。
We also evaluated a 10%10 \% holdout set composed of 1004 exams. To fully appreciate the nuances of our predictive model’s performance, we should consider various metrics extending beyond the conventionally used Mean Absolute Error (MAE) and Mean Squared Error (MSE). While the MAE of 3.88+-0.2313.88 \pm 0.231 and an MSE of 26.47+-26.47 \pm 0.333 certainly offer valuable insights into the average model error, such as consistency with the validation set, there is an additional richness of information to be gleaned from other metrics that help capture the variability of prediction errors. 我们还评估了一个包含 1004 份考试的 10%10 \% 保留集。为了充分了解预测模型性能的细微之处,我们应该考虑各种指标,这些指标超越了常用的平均绝对误差 (MAE) 和均方误差 (MSE)。虽然 3.88+-0.2313.88 \pm 0.231 的 MAE 和 26.47+-26.47 \pm 0.333 的 MSE 无疑为平均模型误差(例如与验证集的一致性)提供了宝贵的见解,但其他指标可以捕捉预测误差的可变性,从而获得更丰富的信息。
We assessed the median absolute error, which is 2.78 years. This measure serves as a helpful indicator of the typical error you might expect, being less sensitive to extreme values than the mean. This measure also indicates that half of our predictions have an absolute prediction error inferior to it. 我们评估了绝对中位误差,为 2.78 年。该指标有助于指示你可能遇到的典型误差,它比平均值更不容易受极值的影响。该指标还表明,我们一半的预测的绝对预测误差小于该值。
Another important measure is the Interquartile Range (IQR) of the Absolute Error, calculated in 4.69 years. It offers a robust measure of the spread of our prediction errors. This metric is particularly useful because it represents the range within which the middle 50%50 \% of our forecast errors fall, providing us with a comprehensive picture of the variability in our predictions while remaining less prone to distortion by potential outliers. 另一个重要指标是绝对误差的四分位距 (IQR),计算结果为 4.69 年。它提供了预测误差分布的稳健度量。该指标特别有用,因为它代表了我们预测误差中间 50%50 \% 的范围,使我们能够全面了解预测的可变性,同时不易受潜在异常值的影响。
We computed the R -squared determination Coefficient to assess the extent to which age variance is explained by our features extracted from the PRs. Our R^(2)R^{2} scored a value of 93%93 \%, which indicates that a substantial proportion of the variance in age can be explained by the features extracted from the exams. 我们计算了 R 方决定系数,以评估从 PR 中提取的特征解释年龄方差的程度。我们的 R^(2)R^{2} 得分值为 93%93 \% ,这表明年龄方差的很大一部分可以由从考试中提取的特征来解释。
Lastly, we plotted the Bland-Altman graphic analysis, also known as a difference plot. Upon examining Fig. 2, we can identify that our model presents almost no systematic bias in error distribution, as the values predominantly cluster around -0.09 on the yy-axis, and our confidence intervals are symmetric. The confidence intervals in the Bland-Altman plot are calculated using the formula IC == mean difference +-1.96 xx\pm 1.96 \times standard deviation of the differences. This formula is used because it provides the range within which 95%95 \% of the differences between predicted and actual values are expected to fall, assuming a normal distribution of errors. Based on the results from the tt-test ( pp-value: 0.57 ), there is no statistical evidence 最后,我们绘制了 Bland-Altman 图分析,也称为差异图。检查图 2 后,我们可以发现我们的模型在误差分布方面几乎没有系统偏差,因为值主要集中在 yy 轴上的-0.09 附近,并且我们的置信区间是对称的。Bland-Altman 图中的置信区间使用公式 IC == 平均差异 +-1.96 xx\pm 1.96 \times 差异的标准差计算。使用此公式是因为它提供了预测值和实际值之间的 95%95 \% 差异的预期范围,假设误差呈正态分布。根据 tt 检验的结果( pp 值:0.57),没有统计学证据
Fig. 2. Bland-Altman difference plot for the baseline model. It visually assesses the bias and variability between the two measures. It does this by plotting the difference between two measurements on the yy-axis against the average of those two measurements on the x -axis. 图 2. 基线模型的 Bland-Altman 差异图。它直观地评估了两种测量方法之间的偏差和变异性。它是通过在 y 轴上绘制两种测量值的差值,在 x 轴上绘制这两种测量值的平均值来实现的。
to suggest that the predictions from our model are significantly different from the actual ages. Still, the uneven distribution of points along the x -axis confirms the existence of random errors. 表明我们的模型预测结果与实际年龄存在显著差异。尽管如此,x 轴上点的分布不均匀也证实了随机误差的存在。
Moreover, the Bland-Altman plot reveals a potential relationship between age and the precision of the model’s predictions. The observed cone-beam spread pattern suggests that the difference between actual and predicted values enlarges as age increases. This pattern may indicate that our model’s predictive accuracy decreases with higher age values. 此外,Bland-Altman 图显示了年龄与模型预测精度之间存在潜在关系。观察到的锥束扩散模式表明,实际值与预测值之间的差异随着年龄的增加而增大。这种模式可能表明,我们的模型预测精度随着年龄值的增加而降低。
This might occur for two reasons: dataset imbalance problems and natural aging complexity gain. To verify if this hypothesis is correct, we conducted a Pearson correlation test to examine whether the training frequency of images by age group was correlated with the model’s performance. 这可能由两个原因造成:数据集不平衡问题和自然衰老带来的复杂性增加。为了验证这一假设是否正确,我们进行了皮尔逊相关性检验,以检查各年龄组图像的训练频率是否与模型的性能相关。
As Fig. 3 indicates, frequency and the model’s MAE have a strong negative correlation. This finding suggests that addressing the imbalance in our dataset could potentially enhance the overall performance of the age estimation model. 如图 3 所示,频率和模型的 MAE 具有很强的负相关性。这一发现表明,解决数据集中的不平衡问题可能会提高年龄估计模型的整体性能。
It is noteworthy that if we examine younger age groups, we can observe a positive coefficient, and the MAE tends to increase with age. For patients aged 0 to 19 years, we observe a Pearson correlation coefficient of 0.57. The coefficient is also strongly positive for the age group 20 to 39 years, 0.63 . This apparent instance of Simpson’s paradox might suggest that the model’s performance could be affected by the imbalance in our data and the complexity inherent to aging. 值得注意的是,如果我们考察更年轻的年龄组,可以观察到正系数,并且 MAE 随着年龄的增长而增加。对于 0 至 19 岁的患者,我们观察到皮尔逊相关系数为 0.57。对于 20 至 39 岁年龄组,系数也强烈为正,为 0.63。这种明显的辛普森悖论现象可能表明,模型的性能可能受到数据不平衡以及衰老固有复杂性的影响。
We performed several data augmentation tests to tackle this issue, including balancing augmentations focusing on less frequent age groups. As described in the Augmentation Tuning subsection, our best result was observed by tripling the dataset through data augmentation. This means that each original image was synthetically augmented into three new images. Our new model achieved a validation MAE of 3.13+-0.193.13 \pm 0.19 and an MSE of 19+-0.2719 \pm 0.27 and showed a shorter gap between training and validation errors during training. 我们进行了几项数据增强测试来解决这个问题,包括关注较少见年龄组的平衡增强。如“增强调整”小节所述,我们通过数据增强将数据集扩大三倍获得了最佳结果。这意味着每个原始图像都被合成增强为三个新图像。我们的新模型实现了 3.13+-0.193.13 \pm 0.19 的验证 MAE 和 19+-0.2719 \pm 0.27 的 MSE,并在训练过程中显示出更小的训练误差和验证误差差距。
In the holdout test, the augmented model demonstrated enhanced performance. It achieved an MAE of 3.1+-3.1 \pm 0.18 years, an MSE of 18.46+-0.2718.46 \pm 0.27 years, a Median Absolute Error of 2.16 years, an IQR of 3.55 years, and a higher R -squared coefficient, indicating an overall improvement in prediction precision and a significant improvement in variability, as shown in Table 1. 在留出测试中,增强模型展现出更高的性能。其平均绝对误差 (MAE) 为 3.1+-3.1 \pm 0.18 年,均方误差 (MSE) 为 18.46+-0.2718.46 \pm 0.27 年,中位数绝对误差为 2.16 年,四分位距 (IQR) 为 3.55 年,并且 R 方系数更高,这表明预测精度整体提高,变异性显著改善,如表 1 所示。
The observed decrease in the MAE and MSE metrics suggests that incorporating new data through augmentation techniques may have allowed the model to learn more robust feature representations. Given that the model is making fewer significant errors in predictions, it is indicative that the model enhanced with augmentation is better at managing outliers and challenging cases; this might be important, especially for legal age confirmation uses. 观察到的 MAE 和 MSE 指标下降表明,通过数据增强技术整合新数据可能使模型学习到更鲁棒的特征表示。鉴于模型在预测中犯的重大错误减少了,这表明增强后的模型更擅长处理异常值和具有挑战性的案例;这可能很重要,尤其是在法定年龄确认的应用中。
Simultaneously, reducing Median and Interquartile Range (IQR) metrics indicates that our augmented model delivers more accurate predictions and consistent results. This suggests an increased level of reliability and stability in our model, an assertion that can be further substantiated by examining our error distribution in the upcoming analysis. 同时,中位数和四分位距 (IQR) 指标的降低表明我们的增强模型提供了更准确的预测和更一致的结果。这表明我们的模型可靠性和稳定性有所提高,这一说法可以通过检查我们即将进行的分析中的误差分布来进一步证实。
^(1){ }^{1} Universidade Federal de Pernambuco, Centro de Informática - CIn, Recife 50740-560, Brazil. ^(2){ }^{2} Universidade Federal de Pernambuco, Centro de Ciências da Saúde, Departamento de Clínica e Odontologia Preventiva, Recife 50670-901, Brazil. ^("Wemail: cz@cin.ufpe.br "){ }^{\text {Wemail: cz@cin.ufpe.br }} ^(1){ }^{1} 巴西伯南布哥联邦大学信息中心 - CIn,累西腓 50740-560。 ^(2){ }^{2} 巴西伯南布哥联邦大学健康科学中心,临床与预防牙科学系,累西腓 50670-901。 ^("Wemail: cz@cin.ufpe.br "){ }^{\text {Wemail: cz@cin.ufpe.br }}