Domestic research status
In China, sEMG-related technologies have also made a lot of progress. In terms of high-density acquisition systems, Li Yidong (2015) designed a 128-channel array sEMG acquisition device, which adopted a submodule architecture (8 independent acquisition modules + data fusion module), realized parallel acquisition at a sampling rate of 1kHz, and transmitted data wirelessly via WiFi. At the same time, a Butterworth notch filter was introduced to suppress power frequency interference, and the crosstalk between channels was less than 5%. In the development of low-cost wearable devices, Wansha et al. (2012) developed a multi-channel sEMG sensing system based on the LabVIEW platform, which integrated preprocessing circuits and data interface boards, and was able to collect 4-channel signals in real time and perform finger force tracking analysis. The system cost was only about 20% of that of imported equipment. In terms of electrode technology innovation, Zhao Zhangyan (2010) developed linear electrodes, printed electrodes, and spring-type probes, which effectively solved the problem of easy detachment of traditional Ag/AgCl electrodes, and verified the stability of the electrodes through vector impedance testing (<10kΩ@50Hz). At the same time, a motion artifact filtering circuit was developed to reduce the basic signal fluctuation by 70%.
In terms of signal processing and feature extraction, nonlinear feature modeling has gradually become the mainstream. Cao Ang et al. (2018) proposed an instantaneous frequency feature extraction method based on EEMD-HT (empirical mode decomposition and Hilbert transform), combined with band spectral entropy (BSE) and PSO-SVM optimization algorithm, to achieve more than 90% muscle fatigue classification accuracy, which is better than the traditional frequency domain method (78%). Luo Zhizeng et al. (2010) used wavelet packet transform (WPT) to extract signal subband energy features, combined with LVQ neural network, to achieve recognition of four types of hand movements (such as wrist extension, wrist flexion, etc.), with an accuracy of 96%, significantly higher than the single frequency domain method (82%). In terms of dynamic signal segmentation, Wu Yansheng (2019) proposed an adaptive segmented detection algorithm based on rolling absolute value averaging, combined with six-layer wavelet decomposition and anti-shake technology, to reduce the false action detection rate from 15% to 5%.
In terms of classification algorithms and application systems, traditional machine learning methods still have certain advantages. Zhang Xu (2010) built an 8-channel sEMG real-time gesture recognition system through anatomically guided sensor layout optimization, and used SVM classifiers to achieve recognition of 20 types of fine gestures, with an online control delay of less than 200ms. In the preliminary exploration of deep learning, Cao Shuhao (2019) built a 32-layer 1D ResNet model based on the Swiss Ninapro database, and achieved an accuracy of 85% in 50 types of gesture classification tasks, although it was limited by the sample size (only 12,000 samples). In the field of rehabilitation robot control, Sun Xin (2010) established an sEMG-based elbow angle BP neural network mapping model with a prediction error of less than 8°, and integrated it into a 5-DOF exoskeleton robot to achieve autonomous motion control for paralyzed patients.
Chapter 2 S EMG Signal Basics and Deep Learning Theory
Physiological basis of sEMG signals
As a key bioelectric signal reflecting human muscle activity, surface electromyography (sEMG) has gradually become an important tool for studying neuromuscular function since it was introduced into the field of sports physiology and rehabilitation medicine in the 1960s. The generation of sEMG signals originates from the electrical stimulation of skeletal muscle fibers by motor neurons, which in turn triggers muscle contraction. Whenever the brain issues a movement command, nerve impulses are transmitted along the motor nerve fibers to the muscle endings, causing the potential on both sides of the muscle fiber membrane to change. This potential change is collected by electrodes on the surface of the skin to form sEMG signals that can be analyzed.
Compared with traditional mechanical sensing or optical motion capture technology, sEMG can directly reflect the process of neural regulation and muscle activation, especially in revealing the physiological mechanisms under pathological conditions such as nerve damage and muscle fatigue. For example, in the rehabilitation process of neurological diseases such as stroke and spinal cord injury, sEMG signals can sensitively capture the functional changes of neuromuscular pathways, providing an objective basis for clinical evaluation and rehabilitation training. In recent years, with the emergence of new sensing technologies such as high-density electrode arrays and flexible electronic materials, the spatial resolution and acquisition comfort of sEMG signals have been significantly improved, creating conditions for the analysis of complex movement patterns and the synchronous acquisition of multi-channel signals.
2.2 sEMG signal acquisition and preprocessing
High-quality acquisition of sEMG signals is inseparable from advanced sensor design and signal processing circuits. Although traditional Ag/AgCl wet electrodes perform well in terms of signal stability, they are prone to falling off and skin irritation when worn for a long time and in large-scale sports scenarios. To this end, researchers have been constantly exploring new electrode materials and structures in recent years. For example, flexible conductive materials such as graphene and carbon nanotubes are widely used in the development of wearable sEMG sensors. In 2014, YaLi Zheng's team prepared a high-ductility strain sensor based on graphene materials. Not only did the stretching rate exceed 200%, the signal-to-noise ratio was also improved to 35dB, and an organic-inorganic composite photoelectric detection module was integrated to achieve the simultaneous acquisition of sEMG and biomechanical parameters. The emergence of this type of flexible sensor has greatly improved the wearing experience and signal quality of traditional electrodes.
In terms of miniaturization and multi-channel acquisition, the Trigno series of devices launched by Delsys in 2023 uses small electrodes and low-power wireless transmission technology, supports the parallel acquisition of 16-channel sEMG and three-axis acceleration signals, and the sampling frequency is increased to 4kHz, providing a solid hardware foundation for dynamic motion monitoring. At the same time, multimodal sensor fusion has become a new trend. The capacitive-optical hybrid sensor recently reported in 2024 can quantify muscle deformation through capacitance changes, and combines near-infrared spectroscopy (NIRS) to achieve real-time analysis of muscle oxygen metabolism status, which increases the sensitivity of muscle fatigue assessment by 40%, significantly better than traditional single-modality sensing solutions.
China has also made significant progress in high-density sEMG acquisition systems. The 128-channel array sEMG acquisition device designed by Li Yidong's team in 2015 uses a submodule architecture and WiFi wireless transmission to achieve high-concurrency, low-crosstalk data acquisition. Zhao Zhangyan and others proposed linear electrodes, printed electrodes, and spring-type probes in terms of electrode structure innovation, effectively solving the problems of traditional electrodes falling off and unstable signals, and reduced signal fluctuations by 70% through vector impedance testing and motion artifact filtering circuits.
sEMG signal is essentially a non-stationary, low-amplitude bioelectric signal that is susceptible to noise interference. In order to extract its effective information, signal preprocessing and feature extraction become key links. Common preprocessing steps include removing DC components, bandpass filtering, removing power frequency interference, normalization, etc. In recent years, with the development of signal processing theory, nonlinear analysis methods such as variational mode decomposition (VMD), empirical mode decomposition (EMD), and Hilbert-Huang transform (HHT) have been introduced into the field of sEMG signal processing. These methods can effectively separate muscle activity components in different frequency bands and improve the signal-to-noise ratio and feature resolution of the signal.
In terms of feature extraction, time domain, frequency domain and time-frequency domain features are widely used. Time domain features such as root mean square (RMS), mean absolute value (MAV), waveform length (WL), etc. can reflect the overall intensity of muscle contraction; frequency domain features such as median frequency (MF) and mean power frequency (MPF) are used to analyze the changes in the spectrum during muscle fatigue. In recent years, combined with time-frequency analysis methods such as wavelet packet transform (WPT) and short-time Fourier transform (STFT), researchers have been able to capture the dynamic changes of muscle activation more carefully. For example, the instantaneous frequency feature extraction method based on EEMD-HT proposed by Cao Ang et al., combined with the band spectral entropy and PSO-SVM optimization algorithm, achieved a muscle fatigue classification accuracy of more than 90%, which is better than the traditional frequency domain method.
In addition, dynamic signal segmentation and adaptive segmentation detection algorithms are also used for automatic segmentation of sEMG signals, which improves the accuracy of action recognition. Wu Yansheng proposed an adaptive segmentation detection algorithm based on rolling absolute value average, combined with wavelet decomposition and anti-shake technology, which reduced the false action detection rate from 15% to 5%.
Basic Theory of Deep Learning
With the rapid development of artificial intelligence technology, sEMG signal analysis has gradually evolved from traditional machine learning methods to deep learning. Early studies mostly used traditional classifiers such as support vector machines (SVM), linear discriminant analysis (LDA), and K-nearest neighbors (KNN), combined with manually extracted features for muscle fatigue or motion recognition. For example, Zhang Xu built an 8-channel sEMG real-time gesture recognition system through anatomically guided sensor layout optimization, and used SVM classifiers to achieve recognition of 20 kinds of fine gestures, with an online control delay of less than 200ms.
In recent years, the application of deep learning models in sEMG signal analysis has gradually increased. Cao Shuhao built a 32-layer 1D ResNet model based on the Swiss Ninapro database, achieving an accuracy of 85% in 50-category gesture classification tasks. The spatiotemporal attention network (TSAN) proposed in 2024 is based on the Transformer structure and achieved an accuracy of 92.3% in the 50-category gesture classification task of the Ninapro DB7 dataset, an increase of 11 percentage points over the traditional convolutional neural network. These studies show that deep learning models can automatically extract high-order features from raw signals and significantly improve classification performance.
In terms of multimodal data fusion, the MIT team proposed a sEMG-EEG-IMU multi-source data joint encoding framework, which uses the dynamic time warping (DTW) algorithm to achieve high-precision synchronization of cross-modal signals, reducing the error to 3.8% in the gait phase prediction task. In terms of high-density signal spatial analysis, the Noraxon Ultium system integrates 128-channel high-density sEMG technology and separates motor unit action potentials (MUAPs) through an independent component analysis (ICA) algorithm, with a spatial resolution of 1mm², providing a new idea for accurately locating the activation area of motor units.
2.5 Application of sEMG in rehabilitation and motor control
sEMG signals are increasingly used in rehabilitation medicine and motion control. In the field of nerve injury rehabilitation, sEMG can monitor the patient's muscle activation in real time and assist doctors in developing personalized rehabilitation plans. For example, the federated learning framework deployed on the Microsoft Azure platform supports distributed model training of more than 100,000 sEMG data, and can build patient-specific muscle coordination models, providing quantitative tools for the development of individualized rehabilitation plans. In the field of rehabilitation robot control, Sun Xin established an sEMG-based elbow angle BP neural network mapping model with a prediction error of less than 8°, and integrated it into a 5-DOF exoskeleton robot to achieve autonomous motion control for paralyzed patients.
In the field of sports science, sEMG can not only quantify the effect of muscle activation, but also reveal details such as shortening the activation delay of the latissimus dorsi muscle by 10 milliseconds when swimming and jumping, which can increase the jumping speed by 1.2%. As an interactive interface for biological signals, sEMG shows great potential in emerging applications such as brain-computer interfaces and the metaverse. The sEMG-EEG hybrid decoding system developed by Ottobock has compressed the control delay of bionic limbs to 120ms and expanded the degrees of freedom of movement to 22, marking a major advancement in fine motion control technology.
2.6 Current status and development trends of research at home and abroad
Looking at the current status of international research, developed countries such as Europe and the United States are in a leading position in high-end sEMG equipment, signal processing algorithms and intelligent rehabilitation systems. High-density, multi-channel sEMG systems launched by companies such as Delsys and Noraxon have been widely used in clinical and scientific research. At the same time, domestic researchers have made significant progress in high-density acquisition systems, low-cost wearable devices, and electrode material innovations, but there is still room for improvement in core algorithms, chip design, and market share of high-end equipment. At present, the domestic high-end sEMG equipment market is mainly monopolized by foreign brands, and the price of a single system is high, and there is an urgent need for domestic substitution.
In the future, with the continuous development of flexible electronics, artificial intelligence and multimodal sensing technology, the collection, processing and application of sEMG signals will become more intelligent and personalized. Multimodal data fusion, deep learning models, personalized rehabilitation programs, etc. will become research hotspots. At the policy level, the "14th Five-Year Plan" has listed intelligent rehabilitation equipment as a key development direction. It is expected that by 2025, the market size of rehabilitation medical equipment will exceed 100 billion yuan. How to enhance the core competitiveness of domestic sEMG equipment and promote independent innovation of algorithms and chips will be the key to my country's breakthrough in this field.
Chapter 3 Design of sEMG signal analysis model based on deep learning
Research framework and overall design
Based on the standardized sEMG data acquisition experiment, this study established a complete data analysis and modeling system through standardized file management and preprocessing processes. All raw data are stored in CSV format and divided into two categories: "fatigue" and "non fatigue". Each type of file is placed in a corresponding folder. Each CSV file records a single-channel signal of a subject in a specific muscle movement state, with a sampling rate of 1000Hz, and the main record column is marked as "amplitudo". According to statistics, there are 26 samples in each category, and the sample size is balanced and representative. During the data reading process, the file list is obtained by traversing the folder, and the redundant suffixes are removed by combining the file name standardization strategy. Lowercase letters are uniformly used to name the files to ensure the uniqueness and consistency of the files in subsequent batch processing. This structured management not only facilitates tracking and automated processing, but also lays a solid foundation for data preprocessing.
3.1.2 sEMG signal preprocessing process
On the basis of data management, in order to eliminate the deviation introduced by instrument drift or environmental noise, this study designed a complete set of sEMG signal preprocessing processes. , Using the zero baseline correction method, the mean of each original signal is used as the correction benchmark, and the overall value is shifted to near 0, thereby effectively eliminating the DC component; then, through full-wave rectification, that is, taking the absolute value of the signal, the expression of signal energy is enhanced. In view of the fact that the original signal is often mixed with irrelevant noise fragments, the data is intercepted in the study, and only the key time period is retained. For example, the two most representative fragments in the movement process are selected, and their index range is strictly screened to ensure that only the main stage information of the signal is retained. Subsequently, the intercepted signal enters the normalization processing link, mapping all sample amplitudes to the [0,1] interval, which not only makes up for the inherent differences between individuals, but also provides a unified scale for subsequent model training. Finally, through Butterworth bandpass filtering (setting the main frequency band of 10~100Hz), high-frequency errors and low-frequency drifts are effectively filtered out, so that the effective components of the signal are highlighted. This processing flow is automatically called through batch functions, and each signal file ultimately outputs two high-quality data segments to prepare for time-frequency feature extraction and subsequent visualization.
3.1.3 Time-frequency feature extraction and visualization
Considering that sEMG signals are essentially non-stationary signals, their frequency components change over time, and time domain analysis alone cannot capture all dynamic features. Therefore, this study uses short-time Fourier transform (STFT) technology to perform time-frequency analysis on the processed signals. Specifically, the signal is divided into multiple short-time windows, and the instantaneous spectrum is obtained by Fourier transform in each window, and then the results of multiple windows are combined into a two-dimensional matrix in chronological order. With the help of graphical tools, the matrix is generated into a color spectrogram, in which the horizontal axis represents time, the vertical axis represents frequency, and different colors intuitively reflect the changes in signal energy distribution. In order to facilitate direct reading by subsequent deep learning models, all spectrograms are cropped and resized, and uniformly saved in PNG format and redundant coordinates and labels are removed to ensure that the image information is pure and easy to archive. The differences between different categories in time-frequency images are quite obvious, which also provides a theoretical basis for automatic classification.
Figure 1 Two-dimensional time-frequency spectrum image
3.1.4 Dataset construction and enhancement
After completing the generation of time-frequency images, the study further focused on dataset construction and enhancement strategies. All generated images are stored according to their categories, and statistical verification is performed to ensure that the number of images in the two categories is balanced. The dataset is divided using the commonly used three-part method, that is, all images are randomly divided into training sets and test sets in an 8:2 ratio, and 20% of the training sets are randomly selected as validation sets. This division method not only prevents data leakage, but also ensures the objectivity of the evaluation. In order to improve the robustness and generalization ability of the model for different input situations, data enhancement techniques such as rotation, translation, scaling, mirror flipping, and brightness perturbation are introduced on the training set, so that each original image can generate diverse and varied variants during the training process. At the same time, all images are uniformly adjusted to 224×224 pixel RGB format to ensure that the deep convolutional network can obtain fixed-dimensional data when input. When using an automatic loader to read data in batches, real-time enhancement operations do not need to pre-store all samples, which not only saves storage resources, but also increases the adaptability of the model to actual scenarios during training.
3.1.5 Deep Learning Model Training and Evaluation Process
Deep learning model training and evaluation stage. This study mainly tested two schemes: custom convolutional neural network and transfer learning model. The custom network uses multi-layer convolution, pooling, batch normalization and Dropout mechanism to extract image features with a small number of parameters; while the transfer learning model uses pre-trained networks such as ResNet50 and VGG16 as feature extractors, only adding a fully connected layer at the top layer to partially freeze the original network parameters, thereby improving the classification performance under small sample conditions with the help of large-scale data pre-training results. The model uniformly uses binary_crossentropy as the loss function when compiling, and is supplemented by the Adam optimizer. In order to alleviate the adverse effects of class imbalance on training, the study also automatically calculates the class weights so that the minority class can be given more attention during the model training process. During the training process, by real-time monitoring of the loss and accuracy changes on the training and validation sets, the early stopping mechanism is used to effectively prevent overfitting and ensure the generalization ability of the network on the test set.
3.1.6 Research Flowchart
In order to intuitively present the training effect, the study also plotted loss functions and accuracy curves, and used confusion matrices, classification reports and other indicators to quantitatively evaluate the model output. The network structure was also visualized through special tools to display the network hierarchy and the connection status of each layer. All evaluation results provide detailed data support for subsequent model improvements and system optimization.
In general, the entire process, from data acquisition to signal preprocessing, to time-frequency feature extraction, visualization, data set construction and enhancement, and finally to the training and evaluation of the deep model, is carefully designed at each step to ensure data quality and model effect. Standardized data management and automated processing procedures not only reduce the errors that may be caused by human intervention, but also provide a scientific basis for repeated experiments. This systematic research framework not only solves the problems of noise and individual differences in the original signal, but also realizes efficient classification of fatigue status through deep learning technology, laying a solid foundation for future promotion in larger samples and actual application scenarios.
Network structure design
3.2.1 Network Input and Data Format
The network input data all come from the sEMG time-frequency images that have been preprocessed and feature extracted. These images are converted from the original electromyographic signals by short-time Fourier transform, which can simultaneously reflect the dynamic characteristics of time and frequency changes. In order to adapt to the current mainstream convolutional neural network, all images are uniformly adjusted to 224×224 pixels in RGB three-channel format. In practice, we store the generated images in folders representing fatigue and non-fatigue, and then use the flow_from_directory method of ImageDataGenerator to load them in batches, while performing real-time image enhancement and normalization during the loading process. Among them, the image pixel values are normalized to the range of [0,1] through the rescale parameter to improve the stability and convergence efficiency of the model during training. The stratified sampling strategy is adopted in the data set division to ensure that the category ratios between the training, validation and test sets are consistent, thus providing a scientific and repeatable input data basis for the model.
3.2.2 Customized Convolutional Neural Network Structure
For network structure design, we built a simplified custom convolutional neural network. The input layer of this network directly accepts the preprocessed 224×224×3 image, and then extracts low-level and mid-level spatial features in the image through two sets of convolutional layers and pooling layers. The first group consists of a 32-channel, 3×3 convolution kernel, and the output size is kept close to the input by setting "same" padding; then batch normalization and ReLU activation are added to pass the nonlinear mapping signal to the subsequent maximum pooling layer. The second group increases the number of channels to 64, and repeats a similar process to further mine more complex feature information. Subsequently, the feature map is flattened through the Flatten layer, and then connected to a 128-unit fully connected layer. Batch normalization and ReLU activation are also used here, and Dropout (set to 0.5) is used to reduce the parameter dependence of the fully connected layer to prevent overfitting. Finally, through an output layer consisting of a single neuron, the classification result is output using the Sigmoid function to achieve fatigue and non-fatigue binary classification. During the model compilation phase, binary_crossentropy is selected as the loss function, Adam is selected as the optimizer, and the learning rate parameter is set according to the experimental debugging results. This structural design not only satisfies the full extraction of spatial features of time-frequency images, but also takes into account the training requirements of small sample data sets and the limitations of computing resources.
3.2.3 Transfer Learning Model (VGG16, ResNet50)
We also introduced a transfer learning strategy to further improve the generalization performance under small sample data. Taking ResNet50 as an example, its pre-trained model weights use the parameters trained with the ImageNet dataset. When loading the pre-trained model, we removed the fully connected classification part of the top layer, retained only the convolutional backbone network, and kept the original feature extraction capability by freezing the backbone network parameters. Subsequently, a global average pooling layer was added after the backbone network, and then a 128-unit fully connected layer, a Dropout layer, and a final output layer were connected to achieve the binary classification task. Similarly, the VGG16 model, after loading its pre-trained convolutional layer, added a custom fully connected part at the top layer, and both models used the same data augmentation strategy as the custom CNN during training to achieve a unified training environment. In this way, with the help of the common edge, texture and other features learned by the pre-trained model on a large dataset, the classification accuracy and stability of the model in small sample scenarios can be significantly improved.
(1) ResNet50 transfer learning model
Load the ImageNet pre-trained weights through ResNet50(weights='imagenet', include_top=False, input_shape=(height, width, 3)), keeping only the convolution part (excluding the top fully connected layer).
Freeze all parameters of the ResNet50 backbone network (base_model.trainable = False) and only train the top custom fully connected layer to prevent overfitting.
A global average pooling layer (GlobalAveragePooling2D), a 128-unit fully connected layer (Dense), Dropout (0.5), and a Sigmoid output layer are added after the backbone network to achieve binary classification.
When compiling the model, the Adam optimizer is used, the learning rate is set to 1e4, the loss function is binary_crossentropy, and the validation set performance is monitored in real time during training.
(2) VGG16 transfer learning model
Load the pre-trained convolutional layer of VGG16 (VGG16(weights='imagenet', include_top=False, input_shape=(height, width, 3))), freeze the parameters, and only train the top custom fully connected layer.
The top-level structure is similar to ResNet50, including global average pooling, full connection, Dropout and Sigmoid output layers.
The transfer learning model can make full use of common features such as edges and textures learned from large-scale data sets to improve the model's recognition ability of sEMG time-frequency images.
(3) Training and evaluation process
During the training process, the same data augmentation and partitioning strategy as the custom CNN is adopted to ensure the fairness of the comparative experiment.
The number of training rounds is set to 10-100 rounds. The model performance is monitored in combination with the validation set, and strategies such as EarlyStopping are used to terminate the training early to prevent overfitting.
Evaluate the model's accuracy, confusion matrix, classification report and other indicators on the test set to comprehensively measure the model performance.
By introducing the transfer learning model, this study not only improved the classification accuracy of the model, but also enhanced the generalization ability and robustness of the model in small sample scenarios. The comparative experiment between transfer learning and custom CNN provides strong theoretical and practical support for subsequent model optimization and practical application.
3.2.4 Network Regularization and Measures to Prevent Overfitting
In terms of network regularization and prevention of overfitting, this study has taken a variety of measures. The internal regularization measure is reflected in the Dropout mechanism. Dropout (0.5) is used after the fully connected layer to randomly inactivate some neurons, thereby reducing the dependence between neurons and enhancing the robustness of the model. On the other hand, Batch Normalization is widely used in various convolutional modules and fully connected layers to solve the problem of internal covariate shift and accelerate the gradient descent process. At the data level, we use data augmentation technology as a supplement, and use ImageDataGenerator to achieve random rotation, translation, scaling, flipping and brightness changes of images, thereby expanding the training set and reducing the risk of overfitting for a single sample. In addition, during the training process, the early stopping callback function (EarlyStopping) and the learning rate decay (ReduceLROnPlateau) strategy are also used: when the validation set loss no longer decreases within a number of consecutive epochs, the training is automatically stopped or the learning rate is reduced to keep the model in the best state. These multiple measures together constitute a complete set of strategies to prevent overfitting, ensuring that the model not only performs well on the training data, but also shows a high generalization ability on the test data.
2.Batch Normalization(批归一化)
Batch normalization is used to solve the problem of input distribution changes (internal covariate shift) in different layers. It normalizes the input of each minibatch to stabilize the input distribution of each layer of the network. In the code, a BatchNormalization layer is embedded after each convolutional layer and fully connected layer:
This approach not only speeds up the convergence of the network, but also plays a regularization role to a certain extent, helping to prevent overfitting by reducing internal covariate shifts.
3. Data Augmentation
In addition to the regularization measures within the network structure, at the data level, we use data augmentation technology to expand the training samples. Data augmentation simulates various changes in the natural environment by performing random rotation, translation, scaling, flipping, brightness adjustment and other operations on the training samples, so that the model can learn more robust features when facing more diverse inputs. ImageDataGenerator is used in the code to implement this strategy, for example:
Data augmentation effectively expands the original samples, smoothes the uncertainty of data distribution during training, and also plays a positive role in preventing model overfitting.
4. Early Stopping and Learning Rate Adjustment
During the training process, we used callback functions such as EarlyStopping and ReduceLROnPlateau. These strategies can automatically stop training or reduce the learning rate when the loss of the monitoring validation set no longer decreases, thereby preventing the model from falling into overfitting or local optimality.
These strategies dynamically adjust the model parameter update rhythm during training, which helps the model better generalize to the test set data.
Regularization parameter setting
In some fully connected layers or convolutional layers, you can also directly set L2 regularization (weight decay). Although we mainly rely on Dropout and BatchNormalization in this study, setting L2 regularization is also a common practice. For example, kernel_regularizer=l2(0.01) can be added to the Dense layer to limit excessive updates of weights and further suppress overfitting.
Through the above-mentioned regularization and anti-overfitting measures, this study adopted a full range of protection from internal structure to training strategy in network design to ensure that the network does not overfit on the training data, while having good generalization performance, providing higher accuracy and robustness for the final fatigue state classification.
3.2.5 Network structure visualization and model interpretation
In order to understand the entire network structure more intuitively, this paper also visualizes and explains the model. The visualkeras library can be used to generate a hierarchical network structure diagram, which details the type, output size, and number of parameters of each layer. This not only provides intuitive materials for paper writing, but also helps with subsequent model debugging. In the visualization diagram, from the input layer to the convolution layer, pooling layer, and then to the fully connected layer and output layer, the structure and connection relationship of each part are clear at a glance. At the same time, the built-in plot_model function of TensorFlow is also used as an auxiliary to save the network structure diagram, which helps to cross-validate whether the parameter configuration of each layer is accurate. Through these methods, we not only verified the rationality of the model design, but also provided a basis for discovering possible redundant layers or parameter bottlenecks during the debugging process. Combined with actual training logs, loss and accuracy curves, and visualization charts such as confusion matrices, the performance and shortcomings of the model in the fatigue state classification task are further explained, providing specific improvement directions for subsequent network optimization.
Loss Function and Optimization Strategy
In the training process of deep learning models, loss functions and optimization strategies play a crucial guiding role in achieving accurate classification of fatigue and non-fatigue states. For the task of binary classification of sEMG signals in this study, this paper not only considers the matching degree between the output probability and the true label when building the model, but also pays more attention to guiding the model to converge to the ideal state to the maximum extent in actual scenarios with small samples and incompletely balanced data distribution. To this end, a systematic design was carried out in many aspects such as loss function, optimizer, learning rate setting, and class imbalance processing.
3.3.1 Choice of loss function (binary_crossentropy)
This study uses binary cross entropy as the loss function. The basic formula of binary cross entropy can be expressed as:
L = -[y·log(p) + (1-y)·log(1-p)]
Among them, y represents the true label (0 or 1), and p is the probability of belonging to the positive class (fatigue) predicted by the model. The main reason for using this loss function is that it is highly sensitive to probability output and can impose a large penalty on prediction errors, thereby forcing the model to correct the deviation as soon as possible. This loss function is used in both custom convolutional neural networks and transfer learning models (such as ResNet50, VGG16) to ensure the consistency and scientificity of the training objectives. Practice has shown that through binary cross entropy, the model can more accurately capture the subtle distribution differences of signals in different states, thereby improving classification performance
3.3.2 Optimizer selection and parameter setting (Adam and its learning rate)
In terms of the choice of optimizer, this study uses the Adam optimizer, which is compatible with the advantages of the momentum method and adaptive learning rate adjustment, so that the model can quickly approach the convergence range in the early stage of training, and maintain stable optimization in the later stage of training. The core of the Adam optimizer is to dynamically adjust the learning rate of each parameter, which is particularly critical for processing high-dimensional and complex time-frequency feature images. Specifically, for the custom CNN model, we set a lower learning rate to avoid gradient oscillation, and for the transfer learning model, since the weights of the pre-trained model already have good feature extraction capabilities, only the top layer needs to be fine-tuned, so a relatively high learning rate is selected to accelerate the adaptation process of the new task. Through a large number of experimental comparisons, the Adam optimizer has shown faster convergence speed and better classification indicators than traditional SGD in this task. Code example:
3.3.3 Learning rate and training strategy (fixed/dynamic learning rate, EarlyStopping, etc.)
The learning rate is a key hyperparameter that affects the model training speed and final performance. In the initial stage, this study sets different fixed learning rates according to the model structure and data volume, and then dynamically adjusts them during the training process by monitoring the validation set loss. Although the implementation in the code of this article does not explicitly introduce dynamic callback functions such as ReduceLROnPlateau, in actual operation, we will pay attention to the training curve and manually intervene in the learning rate when necessary to prevent too high a learning rate from causing unstable training or too low a learning rate from causing slow convergence.
3.3.4 Class imbalance processing (automatic calculation and application of class_weight)
In actual collection, although fatigue and non-fatigue samples are roughly balanced, there may be a slight deviation in quantity after segmentation. To this end, this paper adopts a method of automatically calculating category weights, and uses the sklearn tool to determine the weight of each category in the loss according to the distribution of training set labels. In this way, during the training process, the model can automatically increase its attention to the minority class to avoid the situation where the model tends to the majority class due to data skew. After this dynamic adjustment, the experiment shows that the recall rate and F1 score of the model on the minority class have been significantly improved, which fully proves the effectiveness of the strategy.
The specific steps are as follows:
Use `sklearn.utils.class_weight` to automatically calculate weights based on the training set labels.
When training the model, the calculated class weights are passed to the `fit` function through the `class_weight` parameter.
The code is implemented as follows:
3.3.5 Training process monitoring and regularization (validation set monitoring, Dropout, BatchNorm)
In order to further improve the generalization ability of the model and prevent overfitting on the training data, this paper introduces a variety of regularization measures in the network design and training process. , In the fully connected layer, some neurons are randomly discarded through the Dropout layer, which reduces the model's excessive dependence on specific nodes, making the model more robust when facing new samples. , Batch Normalization layers are embedded after each convolutional layer and fully connected layer. This method not only helps to stabilize the input distribution of each layer and accelerate training, but also plays a regularization role to a certain extent. In addition, at the data level, we adopted a data enhancement strategy to expand the training samples through random rotation, translation, scaling, flipping and brightness perturbation. This enables the model to learn robust features on more diverse image inputs and further suppress overfitting. During the training process, in order to ensure that various indicators are consistent between the training set and the validation set, we also plotted the loss and accuracy change curves, and adjusted the training strategy in time by constantly monitoring the data performance.
Add a Dropout layer (such as `Dropout(0.5)`) after the fully connected layer, and randomly discard 50% of the neurons during each training to reduce the dependency between neurons and effectively prevent overfitting.
A BatchNormalization layer is added after the convolutional layer and the fully connected layer to stabilize the input distribution of each layer, speed up the convergence speed, and play a certain regularization role.
Use `ImageDataGenerator` to perform various enhancement operations (rotation, translation, scaling, flipping, brightness perturbation, etc.) on the training set images, which greatly improves the generalization ability of the model.
During the training process, the loss/accuracy curves of the training set and validation set are monitored in real time to prevent overfitting. The code uses `matplotlib` to draw the training and validation loss curves to facilitate observation of whether the model is overfitting or underfitting.
3.3.6 Comprehensive training and optimization process
Combining the above measures, the training and optimization process of this study can be summarized as follows: in the model compilation stage, binary_crossentropy is used as the loss function, Adam optimizer is used to adjust parameter updates, and a suitable initial learning rate is set; the training data is enhanced in real time through ImageDataGenerator, and the automatically calculated category weights are used to balance the category influence in the loss function; again, Dropout, BatchNormalization and other layers are integrated into the model structure to prevent overfitting, and finally the learning rate is adjusted in time with the help of the monitoring callback function and the early stopping strategy is adopted to ensure the stability and order of the training process. Finally, the model achieved an ideal classification effect on the test set, and its accuracy, recall, precision and F1 score and other indicators were at a high level, proving the effectiveness of the entire optimization strategy system.
Model training and evaluation
This section will introduce in detail the whole process of training and evaluating the deep learning-based sEMG muscle fatigue classification model in this study. The content covers the scientific division and loading of the data set, the training process and parameter setting, performance evaluation indicators, training process visualization, and result analysis and discussion. All contents are closely combined with actual code implementation to ensure the combination of theory and practice.
3.4.1 Dataset division and loading
, the division and loading of data sets are the basis of model training. After all preprocessing and time-spectrum image generation, we store the images in the "spectrograms_fatigue" and "spectrograms_nonfatigue" folders according to their categories. Then, using the os and shutil libraries in Python, these images are divided into training sets and test sets in a ratio of 8:2, and 20% of the samples are randomly selected from the training set as the validation set. In the division stage, we strictly ensure that the proportion of each category in different subsets is consistent, which not only ensures the balance of sample distribution, but also provides an objective basis for subsequent model evaluation. During the data loading process, with the help of the flow_from_directory method of ImageDataGenerator, automatic batch loading by directory is realized. In the code, we set the following enhancement parameters: images are normalized to the [0,1] interval when loaded, and uniformly adjusted to a 224×224 RGB three-channel format. At the same time, the training set is subjected to multiple enhancement operations such as rotation, translation, scaling, random flipping, and brightness perturbation, so that each original image generates several times of variants during the training process, maximizing the sample space, while the test set is only slightly processed to ensure that the evaluation results are close to the actual application scenario. The following code shows the implementation process of data enhancement and loading.
Code Implementation
Through this process, the dataset not only achieves stratified sampling, but also comprehensively considers dataset expansion and normalization during the loading process, providing a solid foundation for subsequent training.
3.4.2 Training process and parameter setting
Next, in the model training phase, we systematically trained the custom CNN model and the transfer learning model respectively. The custom CNN model mainly contains multiple layers of convolution, pooling, batch normalization, activation and fully connected layers. At the same time, the Dropout mechanism (such as Dropout(0.5)) is used after the fully connected layer to reduce the risk of overfitting; while the transfer learning model uses the pre-trained convolution layer of ResNet50 or VGG16 as the feature extractor, and only adds the global average pooling, fully connected and Dropout layers to the top layer. After freezing the weights of the backbone network, only the newly added layers are trained. The input size of all models is fixed to 224×224×3, and the output uses Sigmoid activation to achieve binary classification. When compiling the model, we uniformly selected binary_crossentropy as the loss function and used the Adam optimizer to update the parameters. For the custom CNN, the initial learning rate is set low (for example, 0.000001), while the transfer learning only needs to fine-tune the top-level parameters, so the learning rate is appropriately increased (for example, 0.0001). In the code, the model compilation part is as follows:
model.compile(optimizer=Adam(learning_rate=0.000001), loss='binary_crossentropy', metrics=['accuracy'])
history = model.fit(train_ds, epochs=10, validation_data=val_ds, class_weight=class_weights)
Among them, class_weights is automatically calculated by the sklearn tool to balance the contribution of each category. This allows the model to pay more attention to minority class samples when the number of categories is relatively uneven, thereby improving the overall classification performance. In addition, to prevent the model from overfitting during training, we introduced Dropout and BatchNormalization layers in each layer, and used the EarlyStopping strategy and dynamic learning rate adjustment callback function to monitor the training process. Real-time monitoring of the training and validation loss and accuracy curves is an important basis for judging whether the model is overfitting. If the validation set loss suddenly increases in the later stage of training, it may be necessary to stop training in advance or reduce the learning rate to stabilize parameter updates.
3.4.3 Performance Evaluation Indicators
In order to comprehensively evaluate the performance of the model, this study introduced multiple indicators. The most commonly used accuracy directly reflects the proportion of correct classifications of the model; further, we use the confusion_matrix tool to construct a confusion matrix to observe the classification details of the model from four perspectives: true positive, false positive, true negative, and false negative; in addition, through classification_report, the system outputs comprehensive indicators such as precision, recall, and F1 score to comprehensively judge the performance of the model on fatigue and non-fatigue samples. The specific evaluation code is as follows:
(4) Code implementation
Through the above multi-dimensional evaluation, the classification ability and practical application value of the model can be comprehensively and objectively reflected.
3.4.4 Visualization of the training process
In order to intuitively understand the training process, we also use matplotlib to draw the loss curves of the training set and the validation set. The graph shows the loss change trend of each epoch, which can not only confirm whether the training is stable, but also judge whether there is overfitting or underfitting by the difference between the curves. The code example is as follows:
(3) Code implementation
Through the above visualization methods, abnormal situations in the training process can be discovered in time, guiding the optimization adjustment of model structure and parameters.
3.4.5 Results Analysis and Discussion
In the result analysis and discussion, we not only quantitatively evaluated the accuracy, confusion matrix, and classification report of the model on the test set, but also combined the comparative experiments of different models (custom CNN and transfer learning model) to analyze the impact of data enhancement, category weights, and regularization strategies on the overall performance. If the recall rate or precision rate on a certain category is not good, it may be necessary to further optimize feature extraction or adjust the network structure. In addition, through ablation experiments, we can gradually check the contribution of each module to the model performance, which is of great significance for a deep understanding of the internal working mechanism of the model.
Chapter Summary
This chapter systematically describes the entire process of designing and implementing a deep learning-based sEMG signal muscle fatigue state classification model. Through a detailed introduction to key links such as raw signal acquisition, preprocessing, feature extraction, data set construction, model design, training and evaluation, this paper fully demonstrates the innovation and scientificity of this study at the theoretical and practical levels. The following is a summary and induction of the main work of this chapter.
In the data collection and original signal description section, the source, structure and basic information of the sEMG dataset used in this study are clarified. The dataset contains two categories, "fatigue" and "non fatigue", which are stored in different folders. The number of samples in each category is balanced, ensuring the scientificity and representativeness of subsequent experiments. Through code statistics and file name normalization, a solid foundation is laid for subsequent batch processing and automated analysis.
In the signal preprocessing process, in view of the fact that sEMG signals are easily affected by noise and baseline drift, a multi-step processing process including zero baseline correction, absolutization, interval interception, normalization and Butterworth bandpass filtering was designed. Each step is implemented through a custom function, and all samples are processed in batches, which greatly improves the signal-to-noise ratio and comparability of the signal. The preprocessed signal is not only of higher quality, but also provides a reliable data foundation for subsequent feature extraction and modeling.
The short-time Fourier transform (STFT) is used in the time-frequency feature extraction and visualization process to convert the one-dimensional time series signal into a two-dimensional time-frequency spectrum image (spectrogram). All images are of uniform size and the coordinate axes are removed to facilitate direct reading by the deep learning model. Through STFT, the time and frequency characteristics of the signal can be fully displayed, providing rich information for the model to automatically extract complex features. The experiment found that there are obvious differences in energy distribution and spectral structure between the time-frequency images under fatigue and non-fatigue states, which provides a theoretical basis for the effective classification of the model.
In the data set construction and enhancement part, a combination of stratified sampling and data enhancement is used to scientifically divide the training set, validation set, and test set. ImageDataGenerator is used to perform various enhancement operations on the training set images (such as rotation, translation, scaling, flipping, brightness perturbation, etc.), which greatly improves the generalization ability of the model. All images are unified in 224×224×3 RGB format, and the labels are encoded in binary classification to ensure the standardization of the model input data and the reproducibility of the experiment .
In terms of deep learning model design, we built custom convolutional neural networks (CNNs) and transfer learning models (such as ResNet50 and VGG16). The custom CNN structure is concise and efficient, suitable for small and medium-sized data sets, and can effectively extract the spatial features of time-frequency images. The transfer learning model makes full use of the common features learned by pre-trained models on large-scale data sets (such as ImageNet), significantly improving the classification performance in small sample scenarios. All models use the binary_crossentropy loss function and Adam optimizer, combined with regularization measures such as category weights, Dropout, and BatchNormalization, to ensure the efficiency and stability of training.
In the loss function and optimization strategy section, key strategies such as loss function, optimizer, learning rate, category imbalance processing, regularization and training monitoring are elaborated in detail. By reasonably selecting loss function and optimizer, scientifically setting learning rate, automatically calculating category weights, and adopting a variety of regularization measures (such as Dropout, BatchNormalization, data enhancement, etc.), the training efficiency, stability and generalization ability of the model are greatly improved. During the training process, the loss and accuracy of the training set and validation set are monitored in real time, and callback functions such as EarlyStopping are combined to optimize the training process to prevent overfitting.
In the model training and evaluation phase, scientific data set division and loading, reasonable training parameter settings, multi-dimensional performance evaluation indicators (accuracy, confusion matrix, classification report, etc.), intuitive training process visualization, and in-depth results analysis and discussion were used to fully verify the effectiveness and practical application value of the model. By comparing the classification performance of the custom CNN and transfer learning models, and analyzing the impact of different structures on sEMG signal classification, it provides strong theoretical and practical support for subsequent model optimization and actual deployment.
In summary, this chapter not only introduces in detail the design and implementation process of the deep learning-based sEMG signal muscle fatigue state classification model, but also ensures the scientificity and reproducibility of each step through a large number of code implementations and experimental verifications. Through multi-step preprocessing, feature extraction, data enhancement and deep learning modeling, the accuracy and practicality of sEMG signal muscle movement state analysis have been greatly improved. The above processes and methods have laid a solid theoretical and practical foundation for subsequent experimental results analysis, model optimization and practical application promotion.
The work in this chapter provides a complete technical route and theoretical support for the experimental design, result analysis and model optimization in subsequent chapters, and also provides useful reference and reference for the in-depth application of sEMG signals in intelligent rehabilitation, motion monitoring and other fields.