Unbalanced data processing and multimodal dynamic fusion

Yuan Yihang

Key Laboratory of Modern Manufacturing Technology, Ministry of Education, Guizhou University, Guiyang, Guizhou 550025

Abstract: While advanced manufacturing technology is developing overall, the fault prediction and diagnosis of key components in mechanical equipment also deserve attention. Predicting faults in advance and diagnosing equipment can not only reduce the risk of major disasters but also minimize economic losses. Bearings, tools, rotors, and other key components of mechanical systems operate under high loads and high intensity for long periods, making them prone to failure and wear. Therefore, predictive maintenance of key components such as bearings, tools, and rotors is the focus of enterprise operations and maintenance, as well as a hot topic in academic research. Currently, most research on predictive maintenance of key components in mechanical systems is based on balanced data. However, in practice, the collected data is often unbalanced, with imbalances in intra-class and inter-class data, time series, etc., and there is also noise, which poses significant challenges for fault prediction and diagnosis. Multimodal fusion focuses on integrating information from various modalities to achieve more accurate predictions, and significant progress has been made in a wide range of scenarios, including autonomous driving and medical diagnosis. The development of multimodal fusion technology has brought new solutions to fault diagnosis in the industrial sector. This paper introduces a multimodal fusion technology from the perspective of multimodal data, which integrates data from different modalities, extracts information from various modalities, fills in unbalanced data, and alleviates the impact of data imbalance. It addresses the issue of data imbalance in fault prediction and diagnosis of key components in mechanical equipment using multimodal fusion methods.

Keywords: fault diagnosis; imbalanced data; noise; multimodal data; dynamic fusion

1. Introduction

In recent years, our country has regarded "smart manufacturing engineering" as one of the five key projects of "Made in China 2025" and is fully promoting the rapid development of the manufacturing industry. Mechanical equipment, as the core foundation of manufacturing development, has very wide applications in high-precision industries such as spacecraft, military products, and integrated chips, as well as in everyday transportation industries like automobiles, trains, and subways. However, with the rapid development of technology and the increasing demands of production, the structures of various mechanical devices have become increasingly complex to adapt to various complicated and harsh working conditions. In this environment, key components of mechanical systems such as bearings, tools, and rotors may gradually degrade in performance and health status due to the combined effects of various internal factors (such as self-wear) or external factors (such as high temperature and pressure, super large loads, external impacts, etc.), and may even ultimately fail completely.

Once key components of a mechanical system experience abnormal damage, it can lead to product quality issues and equipment downtime at best, or to irreversible major safety accidents such as injuries or fatalities at worst. However, if people can accurately predict these potential failures in advance and perform equipment maintenance, such tragedies may be averted. Therefore, timely, reasonable, and effective condition monitoring, early warning, and the formulation of maintenance and repair strategies for key components of mechanical systems such as bearings, tools, and rotors are essential to ensure their safe operation and fundamentally prevent catastrophic accidents. The principle of monitoring and predicting key components of mechanical systems involves using sensors for force, temperature, vibration, etc., on the machinery to extract state information of the components, followed by feature extraction, and establishing models based on feature data for fault monitoring and lifespan prediction, thereby providing reasonable maintenance and care suggestions to the staff. However, machinery primarily operates under normal conditions.

The state, under which the data is easy to extract and there is a lot of data, is often shorter when in a fault state, and the fault data is relatively scarce, which leads to an imbalance in state data

^{[3]}

Generally, methods to address the imbalance problem include feature selection methods, improved classifier methods, and resampling techniques, with feature selection methods often combined with the other two methods. Compared to resampling techniques, classification improvement methods do not change the distribution of the original dataset; instead, they address the imbalance problem by assigning a higher misclassification cost to minority samples. Therefore, achieving favorable results in imbalance classification scenarios characterized by noisy data and overlapping classes can be challenging. Resampling techniques, especially oversampling techniques, have potential advantages in addressing issues related to noisy data and class overlap. In addition to traditional sampling-based fault diagnosis frameworks, GAN (Generative Adversarial Network)-based fault diagnosis techniques have also become very popular in recent years. However, both traditional oversampling techniques applied to imbalanced fault monitoring and GAN-based oversampling techniques fail to clearly describe the sample distribution characteristics in noisy and imbalanced small datasets. As a result, these shortcomings often lead to less than ideal outcomes in most research work in this field. Furthermore, studying the complexity of these features in these areas should be crucial. Wei Jian'an's team emphasizes that challenges related to "noise," "data imbalance," "limited samples," and other complexities are urgent pain points that need attention in this field (Yuan et al., 2023).

At the same time, with the arrival of the big data era, single-modal data analysis has become insufficient to meet the demands of complex tasks. Multimodal learning aims to extract information from multiple data sources and enhance prediction and decision-making capabilities by integrating features from various modalities. For example, sentiment analysis can combine the tone and speed of speech with the semantic features of text to comprehensively assess the user's emotional state. Effectively processing multimodal data and organically integrating it can not only improve the predictive power of the model.

It can also enhance the robustness and generalization ability of the system. Our perception of the world is diverse, as we experience it through touch, vision, hearing, smell, and taste. Although some sensory signals are unreliable, humans can extract useful clues from imperfect multimodal inputs and further piece together the entire scene of an event. With the development of sensor technology, we can easily collect various forms of data for analysis. To fully utilize the information from each modality, multimodal fusion has become a promising method for handling imbalanced data or small samples, obtaining accurate and reliable predictions by integrating information from multiple modalities, such as in medical image analysis, autonomous vehicles, and emotion recognition. There can also be significant improvements in the issue of imbalanced fault data in mechanical fault diagnosis; the data obtained from a single modality is limited, but by integrating information from various modalities such as images, videos, and tables, the reliability of the data can be greatly enhanced. Intuitively, fusing information from different modalities provides the possibility to explore cross-modal correlations and achieve better performance.

In this article, I will introduce a new multimodal fusion framework: the Predictive Dynamic Fusion Framework. This framework effectively reduces the upper limit of generalization error and significantly improves the reliability and stability of multimodal systems. Specifically, the PDF predicts the collaborative belief (common belief) of single confidence and full confidence for each modality. Single confidence and full confidence come from the intra-modal negative covariance and inter-modal positive covariance between fusion weights and loss functions, respectively. Additionally, the quality of data in open environments constantly changes, leading to inevitable prediction errors. To address this issue, the framework further proposes a relative calibration method that calibrates the predicted common belief from the perspective of the multimodal system, meaning that the relative advantages of each modality should dynamically change with variations in the quality of other modalities, rather than changing statically. Experiments have shown that this method has strong generalization capabilities.

Achieved excellent results on multiple datasets

^{[10]}

.
2. Theory

In this section, the basic setup and formulas of multimodal fusion are first illustrated. Next, the formula for the generalization error bound is revisited, and its connection to the fusion weights is established, revealing the theoretical guarantee for reducing the upper limit of the generalization error. Finally, a predictable dynamic fusion framework that meets the above theoretical analysis is proposed.

2.1 Basic Settings

Given a multimodal task, let M be defined as the set of modalities, thus

| M |

is the cardinality of M. The training dataset is represented as

D_{train} = {x_{i}, y_{i}}_{i = 1}^{N} \in X \times Y

, where N is the sample size of

D_{train}

x_{i} = {x_{i}^{m}}_{m = 1}^{| M |} ∣

has

| M |

modalities, and

y_{i} \in

Y is the corresponding label. The goal of the framework is to design a predictable fusion weight

ω

for each modality and achieve robust multimodal fusion. The unimodal projection function

f^{m} : X \to Y

is trained to dynamically adjust the fusion weight

ω^{m}

during the training process, where

m \in M

. Decision-level multimodal fusion is:

f (x) = \sum_{m = 1}^{| M |} ω^{m} \cdot f^{m} (x^{m})

2.2 Generalization Error Upper Bound

The generalization error bound (GEB) is an important concept in machine learning, referring to the upper limit of a model's performance on unknown data (Zhang et al., 2023) [11]. Generally, the smaller the upper limit of the generalization error, the better the model's generalization ability, meaning it performs better on unknown joint distributions. For binary classification, the generalization error (GE) of model f can be defined as:

GE (f) = E_{(x, y) \sim D} [ℓ (f (x), y)]

Among them,

ℓ

is the convex log-sigmoid loss function, and D is the unknown dataset.
2.3 Confidence Level

2.3.1 Single-mode confidence level

Using loss as fusion weights for various modes presents significant challenges. Notably, as the training process progresses

Minimization of loss, even a slight deviation can cause substantial disturbances. This sensitivity to small errors in loss estimation may undermine the stability and effectiveness of the weights. Furthermore, the range of loss values can vary from zero to positive infinity, making precise prediction extremely challenging. To mitigate these challenges, the method suggests replacing the loss with the probability of the true class label

(p_{true} \in [0, 1]])

, which is inversely proportional to the loss, as shown in

ℓ = - \log p_{true}

.

By analyzing the properties of

p_{true}

, it was found that it reflects the confidence of modality, and some works (Corbi＇ere et al., 2019) have elaborated on this

^{[12]}

. Using

p_{truu}

as fusion weights not only helps to reduce the upper limit of generalization error but also provides a theoretical guarantee for dynamic multimodal fusion. Since the predictable

p_{true}

only considers the confidence of the current modality, we define it as single confidence:

Mono-Conf^{m} = {\hat{p}}_{t r u e}^{m}

The prediction of

{\hat{p}}_{t r u e}^{m}^{m} p_{true}^{m}

is because there are no true labels in the testing phase.

2.3.2 Overall Confidence

Consider using the sum of losses from other modalities as weights for the given modality to construct the weights. Based on the nature of

ℓ = - \log p_{tru}

, replace

ℓ

with

p_{true}

. Therefore, due to the cross-modal interactions of ptrue, this term is defined as Holo Confidence:

\begin{aligned} idence: & \begin{aligned} Holo {Conf}^{m} & = \frac{\sum_{j \neq m}^{| M |} {\hat{e}}^{j}}{\sum_{i = 1}^{| M |} {\hat{ℓ}}^{i}} \\ = \frac{\log \prod_{j \neq m} {\hat{p}}_{ture}^{j}}{\log \prod_{i = 1}^{| M |} {\hat{p}}_{true}^{r}} \end{aligned} \end{aligned}

Among them,

{\hat{ℓ}}^{i}

and

\tilde{ℓ} j

are the predictions of

e^{i}

and

ℓ j

2.3.3 Collaborative Confidence

Since Mono Confidence and Holo Confidence promote collaborative interaction between modalities, collaborative confidence (Co-belief) is defined as a linear combination of predictable Mono Confidency and Holo Tendence

Can be used as the final fusion weight.

Co-Belief

^{m} =

Mono-Conf

^{m} +

Holo-Conf

^{m}

(5)
3. Method

To achieve reliable predictions, a relative calibration strategy is proposed to calibrate the co-confidence of predictions to address the inevitable uncertainty. With this reliable prediction, the multimodal fusion framework is referred to as Predictive Dynamic Fusion (PDF).

It is worth noting that in open environments, data quality often changes dynamically, leading to inevitable uncertainty in predictions. To reduce the potential uncertainty of collaborative beliefs in complex scenarios, a method called relative calibration (RC) is further proposed, which calibrates the collaborative beliefs of predictions from the perspective of multimodal systems. This means that the relative advantages of each modality should change dynamically with variations in the quality of other modalities, rather than changing statically.

First, define the distribution uniformity DU

^{m}

of the m-th modality in a multimodal system as,

{DU}^{m} = \frac{1}{C} \sum_{i = 1}^{C} | Softmax {(f^{m} (x^{m}))}_{i} - μ |

Among them, C is the category number,

μ

is the average probability,

μ = \frac{1}{C}

. The probability distribution after softmax provides key insights into the model's uncertainty: a uniform distribution usually indicates high uncertainty, while a peaked distribution indicates low uncertainty in predictions (Huang et al., 2021a) [13].

Considering the ever-changing environment, the uncertainty of different modalities in a multimodal system should be relative, meaning that the uncertainty of each modality should dynamically change with the variations in the uncertainties of other modalities. One modality should dynamically perceive the changes in other modalities and adjust its relative contribution to the multimodal system. Therefore, this framework introduces relative calibration (RC) to calibrate the relative uncertainty of each modality. The relative calibration of the m-th modality can be formulated as follows (in the case of two modalities, represented as

m, n \in M

{RC}^{m} = \frac{{DU}^{m}}{{DU}^{n}} = \frac{\sum_{i = 1}^{⌞} | Softmax {(f^{m} (x^{m}))}_{i} - μ |}{\sum_{i = 1}^{C} | Softmax {(f^{n} (x^{n}))}_{i} - μ |}

Considering the factors of the real world,

R C^{m}

adopts an asymmetric form to further calibrate the common belief. Specifically, it is assumed that the modality of

R C^{m} < 1

has greater uncertainty and tends to produce relatively unreliable predictions for

{\hat{p}}_{t r u e}^{m}

(Gawlikowski et al., 2023), thus there is a potential risk in the accuracy of the corresponding common belief. Therefore, the contribution of this modality is reduced by multiplying its predicted common belief by

R C^{m} ({RC}^{m} < 1)

. In contrast, the modality of

{RC}^{m} > 1

is considered to have less uncertainty and accurate common credibility, so the contributions of these modalities can be maintained to reduce optimization difficulty. Based on this, the asymmetric calibration term is defined as:

k^{m} = {\begin{cases} {RC}^{m} = \frac{{DU}^{m}}{{DU}^{n}} & if {DU}^{m} < {DU}^{n} \\ 1 & otherwise \end{cases}

Using an asymmetric calibration strategy to calibrate the co-confidence of the

m

mode, the calibrated co-confidence (CCB) is:

{CCB}^{m} = (Co - {Belief}^{m}) \cdot k^{m}

The framework uses the CCB of each modality as its fusion weight in the multimodal system

\begin{aligned} f (x) & = \sum_{m = 1}^{| M |} ω^{m} \cdot f^{m} (x^{m}) \\ = \sum_{m = 1}^{| M |} Softmax ({CCB}^{m}) \cdot f^{m} (x^{m}) \end{aligned}

As can be seen from the above, the predictive dynamic fusion framework consists of a total of seven parts, as shown in Figure 1. Specifically, it is divided into input data preparation, single-modal confidence calculation, overall confidence calculation, collaborative confidence calculation, relative calibration strategy, collaborative confidence calculation after collaboration, and multi-modal fusion. The framework principle is shown in Figure 2.

Figure 1. Predictive Dynamic Fusion Framework

4. Experiment

4.1 Data Processing

4.1.1 Select dataset

Choose a task dataset that contains multimodal data and has imbalanced labels. Multimodal methods are often used in sentiment analysis and medical diagnosis, so multimodal datasets in these areas can be utilized, with modalities mainly including images, videos, text, and tables.

4.1.2 Data Preprocessing

Preprocessing steps based on data types:
Image data:

Resize: Adjust all images to a fixed size (e.g.,

224 \times 224

pixels). Normalization: Normalize pixel values to

[0, 1]

or standardize (subtract the mean, divide by the standard deviation). Data augmentation: Perform operations such as rotation, flipping, cropping, etc., to increase the diversity of training data. Text data:

Text cleaning: remove stop words, punctuation, non-alphabetic characters, etc. Tokenization: convert text into words or subwords. Word vectorization: use pre-trained word vectors (such as

Figure 2. Framework Schematic

(a) PDF (Ours)

(b) Static Late Fusion

GloVe, Word2Vec) or text encoding through pre-trained models like BERT. Table data:

Missing value imputation: Preprocess the missing parts of tabular data using self-supervised learning methods or common interpolation methods (such as mean imputation, median imputation, KNN imputation)

^{[14]}

. In particular, missing values can be filled in by model predictions. Standardization: Standardize numerical features to ensure consistent scales across different features

^{[15]}

Label data:
Label encoding: If it is a classification task, use one-hot encoding

(One-Hot Encoding) or label encoding (Label Encoding) for conversion.

4.2 Predictive Dynamic Fusion Framework
4.2.1 Weight Calculation

For image and text features

f_{image}

and

f_{text}

, the weights of each modality are calculated through a fully connected (FC) or self-attention mechanism.

This mechanism dynamically adjusts weights based on the features of the current modality and the context of the task.

\begin{aligned} α_{image} \\ = \frac{\exp (W_{image} f_{image})}{\exp (W_{image} f_{image}) + \exp (W_{text} f_{text})} \\ α_{text} = 1 - α_{image} \end{aligned}

Among them,

W_{image}

and

W_{text}

are the weight matrices for learning, representing the contribution of image and text features.

After calculating the weights of each modality, dynamically fuse based on the aforementioned prediction dynamic fusion framework, calculating dynamic fusion weights based on various confidence levels.

4.2.2 Classification Layer

Classify the fused features

f_{fusion 输入到一个全连接层}

\hat{y} = softmax (W_{fusion} f_{fusion})

Among them,

W_{fusiom}

is the weights of the fully connected layer,

\hat{y}

is the predicted category label

^{[16]}

4.2.3 Model Training

Establish the loss function
Use cross-entropy loss function to optimize multi-class problems:

L = - \sum_{i = 1}^{N} y_{i} \cdot \log ({\hat{y}}_{i})

Among them,

y_{i}

represents the true label, and

{\hat{y}}_{i}

represents the predicted label.

Train using the Adam optimizer, adjusting hyperparameters such as learning rate and batch size. Adam is suitable for complex deep neural networks and has the characteristic of adaptive learning rate adjustment. Use cross-validation or grid search to tune the model's hyperparameters.

4. 3 Model Evaluation

According to previous studies (Zhang et al., 2023; Han et al., 2022b; Xie et al., 2017; Ma et al., 2021) [17], Gaussian noise (image and audio model

The average and worst accuracy when there is state noise (text modality) and blank noise. To mitigate the impact of randomness, this framework uses five different seeds to evaluate the model covariance (AC) and GEB decay ratio (GDP) to quantify the ability of the fusion strategy to reduce the upper limit of generalization error, that is, the generalization ability of the model with a specific fusion strategy.

4. 4 Ablation Experiment

To verify the effectiveness of each component in the framework, ablation experiments were conducted on monochrome confidence, holographic confidence, and relative calibration. Additionally, the ability of single confidence, collaborative belief, and calibrated collaborative belief to reduce the upper limit of generalization error was visualized. The average accuracy and worst accuracy of the dataset under different noise intensities were reported. The results indicate that models with calibrated co-belief exhibit the best robustness and generalization ability

^{[19]}

5. Conclusion

Through empirical research, it has been observed that the fusion paradigm of existing methods is often unreliable and lacks theoretical guarantees. Starting from the generalization error bound (GEB), there is a positive and negative correlation between fusion weights and loss, which inspires us to directly predict individual and holistic confidence. Therefore, we obtain predictable collaborative confidence and theoretically guarantee a reduction in GEB. Due to potential predictive uncertainty, it is further calibrated through relative calibration in multimodal systems and used as fusion weights. Comprehensive experiments with in-depth analysis validate that this framework outperforms other methods in terms of accuracy and stability. In addition, the extension of the Predictive Dynamic Fusion (PDF) framework to other tasks is worth exploring.

6. Future and Outlook

The framework has a strong dependence on predictive models and the complexity of dynamically adjusting strategies. Future research directions include optimization.

To improve the performance of predictive models, simplify the implementation of dynamic fusion strategies, and apply this method to more practical scenarios for validation. Most existing multimodal fusion methods utilize machine learning, which has a common drawback: poor interpretability

^{[20]}

. Multimodal fusion methods are often used in medical image fusion, autonomous driving, and financial decision-making. Doctors need to clearly understand whether each step taken by the machine aligns with medical principles, and drivers need to know why the car makes certain decisions to control it in a timely manner. Therefore, interpretability is particularly important. Current methods enhance interpretability based on large language models (LLM)

^{[21]}

, which is a direction that needs attention in the future. Secondly, dynamic multimodal learning methods focus on the variance of modality quality relative to samples, time, or space, which is widely present but often overlooked. Dynamic fusion methods include heuristic strategies (mainly designed for specific applications), attention-based strategies (commonly used for representation fusion), and uncertainty-aware strategies (modeling the uncertainty at the modality and sample level for fusion). Dynamic multimodal learning has enormous potential. There are many dynamic scenarios in practical applications (such as autonomous driving and medical image fusion), making it interesting to design dynamic fusion strategies for specific applications. For example, in multimodal medical images, we can dynamically fuse 18 images at the path level, which can provide better flexibility and interpretability. Multimodal methods are rarely applied in industry; however, integrating medicine and engineering can transfer multimodal methods used in medical diagnosis to industrial engineering cases. In mechanical fault diagnosis applications, using various sensors to obtain data from different modalities and then performing dynamic fusion will effectively alleviate the impact of data imbalance and few samples, thereby enhancing the performance of mechanical component fault diagnosis. This will also serve as an example to generate more instances of applying multimodal methods in industry.

References

Yao Xuemei. Equipment status monitoring based on multi-source data fusion

Research on Intelligent Diagnosis [D]. Guizhou University, 2018

Xu Bo. Research on Fault Diagnosis Methods for Rotating Machinery Based on Machine Learning [D]. Wuhan University of Science and Technology, 2019

Lei Y, Yang B, Jiang X, et al. Applications of machine learning to machine fault diagnosis: A review and roadmap. Mechanical Systems and Signal Processing, 2020, 138: 106587

Santos, M.S., Abreu, P.H., Japkowicz, N., Fernandez, A., & Santos, J. (2023). A unifying view of class overlap and imbalance: Key concepts, multi-view panorama, and open avenues for research. Information Fusion, 89, 228-253

[5] Wei J, Wang J, Huang H, et al. Novel extended NI-MWMOTE-based fault diagnosis method for data-limited and noise-imbalanced scenarios [J]. Expert Systems with Applications, 2024, 238: 121799.

Zareapoor, M., Shamsolmoali, P., & Yang, J. (2021). Oversampling adversarial network for class-imbalanced fault diagnosis. Mechanical Systems and Signal Processing, 149, Article 107175.

Yuan, Y., Wei, J., Huang, H., Jiao, W., Wang, J., & Chen, H. (2023). Review of resampling techniques for the treatment of imbalanced industrial data classification in equipment condition monitoring. Engineering Applications of Artificial Intelligence, 126, Article 106911.

[8] R. Rideaux, K. R. Storrs, G. Maiello, and A. E. Welchman, "How multisensory neurons solve causal inference," Proceedings of the

National Academy of Sciences, vol. 118, no. 32, p. e2106235118, 2021.
[9]A. Yadav and D. K. Vishwakarma, “A deep multi-level attentive network for multimodal sentiment analysis,” ACM Transactions on Multimedia Computing, Communications and Applications, vol. 19, no. 1, 2023.
[10]Cao B, Xia Y, Ding Y, et al. Predictive Dynamic Fusion[J]. arXiv preprint arXiv:2406.04802, 2024.
[11]Zhang, Q., Wu, H., Zhang, C., Hu, Q., Fu, H., Zhou, J. T., and Peng, X. Provable dynamic fusion for low-quality multimodal data. In International conference on machine learning, pp. 41753-41769. PMLR, 2023.
[12]Corbie`re, C., Thome, N., Bar-Hen, A., Cord, M., and Pe rez, P . Addressing failure prediction by learning model confidence. In Advances in Neural Information Processing Systems, volume 32, 2019.
[13]Huang, R., Geng, A., and Li, Y. On the importance of gradients for detecting distributional shifts in the wild. In Advances in Neural Information Processing Systems, volume 34, pp. 677-689, 2021a.

Miao, X., Wu, Y., et al.: An experimental survey of missing data imputation algorithms.

IEEE Transactions on Knowledge and Data Engineering (2022)

Borisov, V., Leemann, T., Seßler, K., Haug, J., Pawelczyk, M., Kasneci, G.: Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems (2022)
[16]Zhang Q, Wei Y, Han Z, et al. Multimodal fusion on low-quality data: A comprehensive survey[J]. arXiv preprint arXiv:2404.18947, 2024.
[17]Xue, Z. and Marculescu, R. Dynamic multimodal fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2574-2583, 2023.
[18]Han, Z., Yang, F., Huang, J., Zhang, C., and Yao, J. Multimodal dynamics: Dynamical fusion for trustworthy multimodal classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 20707 - 20717, 2022a.

[19]Wanyan Y, Yang X, Chen C, et al. Active exploration of multimodal complementarity for few-shot action recognition[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 6492-6502.
[20]Wang Z, Huang C, Yao X. A Roadmap of Explainable Artificial Intelligence: Explain to Whom, When, What and How?[J]. ACM Transactions on Autonomous and Adaptive Systems, 2024, 19(4): 1-40.
[21]Changwu Huang, Zeqi Zhang, Bifei Mao, and Xin Yao. 2022. An overview of artificial intelligence ethics. IEEE Transactions on Artificial Intelligence 4, 4 (2023), 799 - 819.