This is a bilingual snapshot page saved by the user at 2025-5-6 10:53 for https://app.immersivetranslate.com/file/, provided with bilingual support by Immersive Translate. Learn how to save?

Domestic research status

In China, sEMG-related technologies have also made a lot of progress. In terms of high-density acquisition systems, Li Yidong (2015) designed a 128-channel array sEMG acquisition device, which adopted a submodule architecture (8 independent acquisition modules + data fusion module), realized parallel acquisition at a sampling rate of 1kHz, and transmitted data wirelessly via WiFi. At the same time, a Butterworth notch filter was introduced to suppress power frequency interference, and the crosstalk between channels was less than 5%. In the development of low-cost wearable devices, Wansha et al. (2012) developed a multi-channel sEMG sensing system based on the LabVIEW platform, which integrated preprocessing circuits and data interface boards, and was able to collect 4-channel signals in real time and perform finger force tracking analysis. The system cost was only about 20% of that of imported equipment. In terms of electrode technology innovation, Zhao Zhangyan (2010) developed linear electrodes, printed electrodes, and spring-type probes, which effectively solved the problem of easy detachment of traditional Ag/AgCl electrodes, and verified the stability of the electrodes through vector impedance testing (<10kΩ@50Hz). At the same time, a motion artifact filtering circuit was developed to reduce the basic signal fluctuation by 70%.

In terms of signal processing and feature extraction, nonlinear feature modeling has gradually become the mainstream. Cao Ang et al. (2018) proposed an instantaneous frequency feature extraction method based on EEMD-HT (empirical mode decomposition and Hilbert transform), combined with band spectral entropy (BSE) and PSO-SVM optimization algorithm, to achieve more than 90% muscle fatigue classification accuracy, which is better than the traditional frequency domain method (78%). Luo Zhizeng et al. (2010) used wavelet packet transform (WPT) to extract signal subband energy features, combined with LVQ neural network, to achieve recognition of four types of hand movements (such as wrist extension, wrist flexion, etc.), with an accuracy of 96%, significantly higher than the single frequency domain method (82%). In terms of dynamic signal segmentation, Wu Yansheng (2019) proposed an adaptive segmented detection algorithm based on rolling absolute value averaging, combined with six-layer wavelet decomposition and anti-shake technology, to reduce the false action detection rate from 15% to 5%.

In terms of classification algorithms and application systems, traditional machine learning methods still have certain advantages. Zhang Xu (2010) built an 8-channel sEMG real-time gesture recognition system through anatomically guided sensor layout optimization, and used SVM classifiers to achieve recognition of 20 types of fine gestures, with an online control delay of less than 200ms. In the preliminary exploration of deep learning, Cao Shuhao (2019) built a 32-layer 1D ResNet model based on the Swiss Ninapro database, and achieved an accuracy of 85% in 50 types of gesture classification tasks, although it was limited by the sample size (only 12,000 samples). In the field of rehabilitation robot control, Sun Xin (2010) established an sEMG-based elbow angle BP neural network mapping model with a prediction error of less than 8°, and integrated it into a 5-DOF exoskeleton robot to achieve autonomous motion control for paralyzed patients.

Chapter 2 S EMG Signal Basics and Deep Learning Theory

****

Physiological basis of sEMG signals

As a key bioelectric signal reflecting human muscle activity, surface electromyography (sEMG) has gradually become an important tool for studying neuromuscular function since it was introduced into the field of sports physiology and rehabilitation medicine in the 1960s. The generation of sEMG signals originates from the electrical stimulation of skeletal muscle fibers by motor neurons, which in turn triggers muscle contraction. Whenever the brain issues a movement command, nerve impulses are transmitted along the motor nerve fibers to the muscle endings, causing the potential on both sides of the muscle fiber membrane to change. This potential change is collected by electrodes on the surface of the skin to form sEMG signals that can be analyzed.

Compared with traditional mechanical sensing or optical motion capture technology, sEMG can directly reflect the process of neural regulation and muscle activation, especially in revealing the physiological mechanisms under pathological conditions such as nerve damage and muscle fatigue. For example, in the rehabilitation process of neurological diseases such as stroke and spinal cord injury, sEMG signals can sensitively capture the functional changes of neuromuscular pathways, providing an objective basis for clinical evaluation and rehabilitation training. In recent years, with the emergence of new sensing technologies such as high-density electrode arrays and flexible electronic materials, the spatial resolution and acquisition comfort of sEMG signals have been significantly improved, creating conditions for the analysis of complex movement patterns and the synchronous acquisition of multi-channel signals.

2.2 sEMG signal acquisition and preprocessing

High-quality acquisition of sEMG signals is inseparable from advanced sensor design and signal processing circuits. Although traditional Ag/AgCl wet electrodes perform well in terms of signal stability, they are prone to falling off and skin irritation when worn for a long time and in large-scale sports scenarios. To this end, researchers have been constantly exploring new electrode materials and structures in recent years. For example, flexible conductive materials such as graphene and carbon nanotubes are widely used in the development of wearable sEMG sensors. In 2014, YaLi Zheng's team prepared a high-ductility strain sensor based on graphene materials. Not only did the stretching rate exceed 200%, the signal-to-noise ratio was also improved to 35dB, and an organic-inorganic composite photoelectric detection module was integrated to achieve the simultaneous acquisition of sEMG and biomechanical parameters. The emergence of this type of flexible sensor has greatly improved the wearing experience and signal quality of traditional electrodes.

In terms of miniaturization and multi-channel acquisition, the Trigno series of devices launched by Delsys in 2023 uses small electrodes and low-power wireless transmission technology, supports the parallel acquisition of 16-channel sEMG and three-axis acceleration signals, and the sampling frequency is increased to 4kHz, providing a solid hardware foundation for dynamic motion monitoring. At the same time, multimodal sensor fusion has become a new trend. The capacitive-optical hybrid sensor recently reported in 2024 can quantify muscle deformation through capacitance changes, and combines near-infrared spectroscopy (NIRS) to achieve real-time analysis of muscle oxygen metabolism status, which increases the sensitivity of muscle fatigue assessment by 40%, significantly better than traditional single-modality sensing solutions.

China has also made significant progress in high-density sEMG acquisition systems. The 128-channel array sEMG acquisition device designed by Li Yidong's team in 2015 uses a submodule architecture and WiFi wireless transmission to achieve high-concurrency, low-crosstalk data acquisition. Zhao Zhangyan and others proposed linear electrodes, printed electrodes, and spring-type probes in terms of electrode structure innovation, effectively solving the problems of traditional electrodes falling off and unstable signals, and reduced signal fluctuations by 70% through vector impedance testing and motion artifact filtering circuits.

sEMG signal is essentially a non-stationary, low-amplitude bioelectric signal that is susceptible to noise interference. In order to extract its effective information, signal preprocessing and feature extraction become key links. Common preprocessing steps include removing DC components, bandpass filtering, removing power frequency interference, normalization, etc. In recent years, with the development of signal processing theory, nonlinear analysis methods such as variational mode decomposition (VMD), empirical mode decomposition (EMD), and Hilbert-Huang transform (HHT) have been introduced into the field of sEMG signal processing. These methods can effectively separate muscle activity components in different frequency bands and improve the signal-to-noise ratio and feature resolution of the signal.

In terms of feature extraction, time domain, frequency domain and time-frequency domain features are widely used. Time domain features such as root mean square (RMS), mean absolute value (MAV), waveform length (WL), etc. can reflect the overall intensity of muscle contraction; frequency domain features such as median frequency (MF) and mean power frequency (MPF) are used to analyze the changes in the spectrum during muscle fatigue. In recent years, combined with time-frequency analysis methods such as wavelet packet transform (WPT) and short-time Fourier transform (STFT), researchers have been able to capture the dynamic changes of muscle activation more carefully. For example, the instantaneous frequency feature extraction method based on EEMD-HT proposed by Cao Ang et al., combined with the band spectral entropy and PSO-SVM optimization algorithm, achieved a muscle fatigue classification accuracy of more than 90%, which is better than the traditional frequency domain method.

In addition, dynamic signal segmentation and adaptive segmentation detection algorithms are also used for automatic segmentation of sEMG signals, which improves the accuracy of action recognition. Wu Yansheng proposed an adaptive segmentation detection algorithm based on rolling absolute value average, combined with wavelet decomposition and anti-shake technology, which reduced the false action detection rate from 15% to 5%.

Basic Theory of Deep Learning

With the rapid development of artificial intelligence technology, sEMG signal analysis has gradually evolved from traditional machine learning methods to deep learning. Early studies mostly used traditional classifiers such as support vector machines (SVM), linear discriminant analysis (LDA), and K-nearest neighbors (KNN), combined with manually extracted features for muscle fatigue or motion recognition. For example, Zhang Xu built an 8-channel sEMG real-time gesture recognition system through anatomically guided sensor layout optimization, and used SVM classifiers to achieve recognition of 20 kinds of fine gestures, with an online control delay of less than 200ms.

In recent years, the application of deep learning models in sEMG signal analysis has gradually increased. Cao Shuhao built a 32-layer 1D ResNet model based on the Swiss Ninapro database, achieving an accuracy of 85% in 50-category gesture classification tasks. The spatiotemporal attention network (TSAN) proposed in 2024 is based on the Transformer structure and achieved an accuracy of 92.3% in the 50-category gesture classification task of the Ninapro DB7 dataset, an increase of 11 percentage points over the traditional convolutional neural network. These studies show that deep learning models can automatically extract high-order features from raw signals and significantly improve classification performance.

In terms of multimodal data fusion, the MIT team proposed a sEMG-EEG-IMU multi-source data joint encoding framework, which uses the dynamic time warping (DTW) algorithm to achieve high-precision synchronization of cross-modal signals, reducing the error to 3.8% in the gait phase prediction task. In terms of high-density signal spatial analysis, the Noraxon Ultium system integrates 128-channel high-density sEMG technology and separates motor unit action potentials (MUAPs) through an independent component analysis (ICA) algorithm, with a spatial resolution of 1mm², providing a new idea for accurately locating the activation area of ​​motor units.

2.5 Application of sEMG in rehabilitation and motor control

sEMG signals are increasingly used in rehabilitation medicine and motion control. In the field of nerve injury rehabilitation, sEMG can monitor the patient's muscle activation in real time and assist doctors in developing personalized rehabilitation plans. For example, the federated learning framework deployed on the Microsoft Azure platform supports distributed model training of more than 100,000 sEMG data, and can build patient-specific muscle coordination models, providing quantitative tools for the development of individualized rehabilitation plans. In the field of rehabilitation robot control, Sun Xin established an sEMG-based elbow angle BP neural network mapping model with a prediction error of less than 8°, and integrated it into a 5-DOF exoskeleton robot to achieve autonomous motion control for paralyzed patients.

In the field of sports science, sEMG can not only quantify the effect of muscle activation, but also reveal details such as shortening the activation delay of the latissimus dorsi muscle by 10 milliseconds when swimming and jumping, which can increase the jumping speed by 1.2%. As an interactive interface for biological signals, sEMG shows great potential in emerging applications such as brain-computer interfaces and the metaverse. The sEMG-EEG hybrid decoding system developed by Ottobock has compressed the control delay of bionic limbs to 120ms and expanded the degrees of freedom of movement to 22, marking a major advancement in fine motion control technology.

2.6 Current status and development trends of research at home and abroad

Looking at the current status of international research, developed countries such as Europe and the United States are in a leading position in high-end sEMG equipment, signal processing algorithms and intelligent rehabilitation systems. High-density, multi-channel sEMG systems launched by companies such as Delsys and Noraxon have been widely used in clinical and scientific research. At the same time, domestic researchers have made significant progress in high-density acquisition systems, low-cost wearable devices, and electrode material innovations, but there is still room for improvement in core algorithms, chip design, and market share of high-end equipment. At present, the domestic high-end sEMG equipment market is mainly monopolized by foreign brands, and the price of a single system is high, and there is an urgent need for domestic substitution.

In the future, with the continuous development of flexible electronics, artificial intelligence and multimodal sensing technology, the collection, processing and application of sEMG signals will become more intelligent and personalized. Multimodal data fusion, deep learning models, personalized rehabilitation programs, etc. will become research hotspots. At the policy level, the "14th Five-Year Plan" has listed intelligent rehabilitation equipment as a key development direction. It is expected that by 2025, the market size of rehabilitation medical equipment will exceed 100 billion yuan. How to enhance the core competitiveness of domestic sEMG equipment and promote independent innovation of algorithms and chips will be the key to my country's breakthrough in this field.

Chapter 3 Design of sEMG signal analysis model based on deep learning

Research framework and overall design

Based on the standardized sEMG data acquisition experiment, this study established a complete data analysis and modeling system through standardized file management and preprocessing processes. All raw data are stored in CSV format and divided into two categories: "fatigue" and "non fatigue". Each type of file is placed in a corresponding folder. Each CSV file records a single-channel signal of a subject in a specific muscle movement state, with a sampling rate of 1000Hz, and the main record column is marked as "amplitudo". According to statistics, there are 26 samples in each category, and the sample size is balanced and representative. During the data reading process, the file list is obtained by traversing the folder, and the redundant suffixes are removed by combining the file name standardization strategy. Lowercase letters are uniformly used to name the files to ensure the uniqueness and consistency of the files in subsequent batch processing. This structured management not only facilitates tracking and automated processing, but also lays a solid foundation for data preprocessing.

3.1.2 sEMG signal preprocessing process

On the basis of data management, in order to eliminate the deviation introduced by instrument drift or environmental noise, this study designed a complete set of sEMG signal preprocessing processes. , Using the zero baseline correction method, the mean of each original signal is used as the correction benchmark, and the overall value is shifted to near 0, thereby effectively eliminating the DC component; then, through full-wave rectification, that is, taking the absolute value of the signal, the expression of signal energy is enhanced. In view of the fact that the original signal is often mixed with irrelevant noise fragments, the data is intercepted in the study, and only the key time period is retained. For example, the two most representative fragments in the movement process are selected, and their index range is strictly screened to ensure that only the main stage information of the signal is retained. Subsequently, the intercepted signal enters the normalization processing link, mapping all sample amplitudes to the [0,1] interval, which not only makes up for the inherent differences between individuals, but also provides a unified scale for subsequent model training. Finally, through Butterworth bandpass filtering (setting the main frequency band of 10~100Hz), high-frequency errors and low-frequency drifts are effectively filtered out, so that the effective components of the signal are highlighted. This processing flow is automatically called through batch functions, and each signal file ultimately outputs two high-quality data segments to prepare for time-frequency feature extraction and subsequent visualization.

3.1.3 Time-frequency feature extraction and visualization

Considering that sEMG signals are essentially non-stationary signals, their frequency components change over time, and time domain analysis alone cannot capture all dynamic features. Therefore, this study uses short-time Fourier transform (STFT) technology to perform time-frequency analysis on the processed signals. Specifically, the signal is divided into multiple short-time windows, and the instantaneous spectrum is obtained by Fourier transform in each window, and then the results of multiple windows are combined into a two-dimensional matrix in chronological order. With the help of graphical tools, the matrix is ​​generated into a color spectrogram, in which the horizontal axis represents time, the vertical axis represents frequency, and different colors intuitively reflect the changes in signal energy distribution. In order to facilitate direct reading by subsequent deep learning models, all spectrograms are cropped and resized, and uniformly saved in PNG format and redundant coordinates and labels are removed to ensure that the image information is pure and easy to archive. The differences between different categories in time-frequency images are quite obvious, which also provides a theoretical basis for automatic classification.

Figure 1 Two-dimensional time-frequency spectrum image

3.1.4 Dataset construction and enhancement

After completing the generation of time-frequency images, the study further focused on dataset construction and enhancement strategies. All generated images are stored according to their categories, and statistical verification is performed to ensure that the number of images in the two categories is balanced. The dataset is divided using the commonly used three-part method, that is, all images are randomly divided into training sets and test sets in an 8:2 ratio, and 20% of the training sets are randomly selected as validation sets. This division method not only prevents data leakage, but also ensures the objectivity of the evaluation. In order to improve the robustness and generalization ability of the model for different input situations, data enhancement techniques such as rotation, translation, scaling, mirror flipping, and brightness perturbation are introduced on the training set, so that each original image can generate diverse and varied variants during the training process. At the same time, all images are uniformly adjusted to 224×224 pixel RGB format to ensure that the deep convolutional network can obtain fixed-dimensional data when input. When using an automatic loader to read data in batches, real-time enhancement operations do not need to pre-store all samples, which not only saves storage resources, but also increases the adaptability of the model to actual scenarios during training.

3.1.5 Deep Learning Model Training and Evaluation Process

Deep learning model training and evaluation stage. This study mainly tested two schemes: custom convolutional neural network and transfer learning model. The custom network uses multi-layer convolution, pooling, batch normalization and Dropout mechanism to extract image features with a small number of parameters; while the transfer learning model uses pre-trained networks such as ResNet50 and VGG16 as feature extractors, only adding a fully connected layer at the top layer to partially freeze the original network parameters, thereby improving the classification performance under small sample conditions with the help of large-scale data pre-training results. The model uniformly uses binary_crossentropy as the loss function when compiling, and is supplemented by the Adam optimizer. In order to alleviate the adverse effects of class imbalance on training, the study also automatically calculates the class weights so that the minority class can be given more attention during the model training process. During the training process, by real-time monitoring of the loss and accuracy changes on the training and validation sets, the early stopping mechanism is used to effectively prevent overfitting and ensure the generalization ability of the network on the test set.

3.1.6 Research Flowchart

In order to intuitively present the training effect, the study also plotted loss functions and accuracy curves, and used confusion matrices, classification reports and other indicators to quantitatively evaluate the model output. The network structure was also visualized through special tools to display the network hierarchy and the connection status of each layer. All evaluation results provide detailed data support for subsequent model improvements and system optimization.

In general, the entire process, from data acquisition to signal preprocessing, to time-frequency feature extraction, visualization, data set construction and enhancement, and finally to the training and evaluation of the deep model, is carefully designed at each step to ensure data quality and model effect. Standardized data management and automated processing procedures not only reduce the errors that may be caused by human intervention, but also provide a scientific basis for repeated experiments. This systematic research framework not only solves the problems of noise and individual differences in the original signal, but also realizes efficient classification of fatigue status through deep learning technology, laying a solid foundation for future promotion in larger samples and actual application scenarios.

Network structure design

3.2.1 Network Input and Data Format

The network input data all come from the sEMG time-frequency images that have been preprocessed and feature extracted. These images are converted from the original electromyographic signals by short-time Fourier transform, which can simultaneously reflect the dynamic characteristics of time and frequency changes. In order to adapt to the current mainstream convolutional neural network, all images are uniformly adjusted to 224×224 pixels in RGB three-channel format. In practice, we store the generated images in folders representing fatigue and non-fatigue, and then use the flow_from_directory method of ImageDataGenerator to load them in batches, while performing real-time image enhancement and normalization during the loading process. Among them, the image pixel values ​​are normalized to the range of [0,1] through the rescale parameter to improve the stability and convergence efficiency of the model during training. The stratified sampling strategy is adopted in the data set division to ensure that the category ratios between the training, validation and test sets are consistent, thus providing a scientific and repeatable input data basis for the model.

3.2.2 Customized Convolutional Neural Network Structure

For network structure design, we built a simplified custom convolutional neural network. The input layer of this network directly accepts the preprocessed 224×224×3 image, and then extracts low-level and mid-level spatial features in the image through two sets of convolutional layers and pooling layers. The first group consists of a 32-channel, 3×3 convolution kernel, and the output size is kept close to the input by setting "same" padding; then batch normalization and ReLU activation are added to pass the nonlinear mapping signal to the subsequent maximum pooling layer. The second group increases the number of channels to 64, and repeats a similar process to further mine more complex feature information. Subsequently, the feature map is flattened through the Flatten layer, and then connected to a 128-unit fully connected layer. Batch normalization and ReLU activation are also used here, and Dropout (set to 0.5) is used to reduce the parameter dependence of the fully connected layer to prevent overfitting. Finally, through an output layer consisting of a single neuron, the classification result is output using the Sigmoid function to achieve fatigue and non-fatigue binary classification. During the model compilation phase, binary_crossentropy is selected as the loss function, Adam is selected as the optimizer, and the learning rate parameter is set according to the experimental debugging results. This structural design not only satisfies the full extraction of spatial features of time-frequency images, but also takes into account the training requirements of small sample data sets and the limitations of computing resources.

3.2.3 Transfer Learning Model (VGG16, ResNet50)

We also introduced a transfer learning strategy to further improve the generalization performance under small sample data. Taking ResNet50 as an example, its pre-trained model weights use the parameters trained with the ImageNet dataset. When loading the pre-trained model, we removed the fully connected classification part of the top layer, retained only the convolutional backbone network, and kept the original feature extraction capability by freezing the backbone network parameters. Subsequently, a global average pooling layer was added after the backbone network, and then a 128-unit fully connected layer, a Dropout layer, and a final output layer were connected to achieve the binary classification task. Similarly, the VGG16 model, after loading its pre-trained convolutional layer, added a custom fully connected part at the top layer, and both models used the same data augmentation strategy as the custom CNN during training to achieve a unified training environment. In this way, with the help of the common edge, texture and other features learned by the pre-trained model on a large dataset, the classification accuracy and stability of the model in small sample scenarios can be significantly improved.

(1) ResNet50 transfer learning model

Load the ImageNet pre-trained weights through ResNet50(weights='imagenet', include_top=False, input_shape=(height, width, 3)), keeping only the convolution part (excluding the top fully connected layer).

Freeze all parameters of the ResNet50 backbone network (base_model.trainable = False) and only train the top custom fully connected layer to prevent overfitting.

A global average pooling layer (GlobalAveragePooling2D), a 128-unit fully connected layer (Dense), Dropout (0.5), and a Sigmoid output layer are added after the backbone network to achieve binary classification.

When compiling the model, the Adam optimizer is used, the learning rate is set to 1e4, the loss function is binary_crossentropy, and the validation set performance is monitored in real time during training.

(2) VGG16 transfer learning model

Load the pre-trained convolutional layer of VGG16 (VGG16(weights='imagenet', include_top=False, input_shape=(height, width, 3))), freeze the parameters, and only train the top custom fully connected layer.

The top-level structure is similar to ResNet50, including global average pooling, full connection, Dropout and Sigmoid output layers.

The transfer learning model can make full use of common features such as edges and textures learned from large-scale data sets to improve the model's recognition ability of sEMG time-frequency images.

(3) Training and evaluation process

During the training process, the same data augmentation and partitioning strategy as the custom CNN is adopted to ensure the fairness of the comparative experiment.

The number of training rounds is set to 10-100 rounds. The model performance is monitored in combination with the validation set, and strategies such as EarlyStopping are used to terminate the training early to prevent overfitting.

Evaluate the model's accuracy, confusion matrix, classification report and other indicators on the test set to comprehensively measure the model performance.

By introducing the transfer learning model, this study not only improved the classification accuracy of the model, but also enhanced the generalization ability and robustness of the model in small sample scenarios. The comparative experiment between transfer learning and custom CNN provides strong theoretical and practical support for subsequent model optimization and practical application.

3.2.4 Network Regularization and Measures to Prevent Overfitting

In terms of network regularization and prevention of overfitting, this study has taken a variety of measures. The internal regularization measure is reflected in the Dropout mechanism. Dropout (0.5) is used after the fully connected layer to randomly inactivate some neurons, thereby reducing the dependence between neurons and enhancing the robustness of the model. On the other hand, Batch Normalization is widely used in various convolutional modules and fully connected layers to solve the problem of internal covariate shift and accelerate the gradient descent process. At the data level, we use data augmentation technology as a supplement, and use ImageDataGenerator to achieve random rotation, translation, scaling, flipping and brightness changes of images, thereby expanding the training set and reducing the risk of overfitting for a single sample. In addition, during the training process, the early stopping callback function (EarlyStopping) and the learning rate decay (ReduceLROnPlateau) strategy are also used: when the validation set loss no longer decreases within a number of consecutive epochs, the training is automatically stopped or the learning rate is reduced to keep the model in the best state. These multiple measures together constitute a complete set of strategies to prevent overfitting, ensuring that the model not only performs well on the training data, but also shows a high generalization ability on the test data.

2.Batch Normalization(批归一化)

Batch normalization is used to solve the problem of input distribution changes (internal covariate shift) in different layers. It normalizes the input of each minibatch to stabilize the input distribution of each layer of the network. In the code, a BatchNormalization layer is embedded after each convolutional layer and fully connected layer:

This approach not only speeds up the convergence of the network, but also plays a regularization role to a certain extent, helping to prevent overfitting by reducing internal covariate shifts.

3. Data Augmentation

In addition to the regularization measures within the network structure, at the data level, we use data augmentation technology to expand the training samples. Data augmentation simulates various changes in the natural environment by performing random rotation, translation, scaling, flipping, brightness adjustment and other operations on the training samples, so that the model can learn more robust features when facing more diverse inputs. ImageDataGenerator is used in the code to implement this strategy, for example:

Data augmentation effectively expands the original samples, smoothes the uncertainty of data distribution during training, and also plays a positive role in preventing model overfitting.

4. Early Stopping and Learning Rate Adjustment

During the training process, we used callback functions such as EarlyStopping and ReduceLROnPlateau. These strategies can automatically stop training or reduce the learning rate when the loss of the monitoring validation set no longer decreases, thereby preventing the model from falling into overfitting or local optimality.

These strategies dynamically adjust the model parameter update rhythm during training, which helps the model better generalize to the test set data.

Regularization parameter setting

In some fully connected layers or convolutional layers, you can also directly set L2 regularization (weight decay). Although we mainly rely on Dropout and BatchNormalization in this study, setting L2 regularization is also a common practice. For example, kernel_regularizer=l2(0.01) can be added to the Dense layer to limit excessive updates of weights and further suppress overfitting.

Through the above-mentioned regularization and anti-overfitting measures, this study adopted a full range of protection from internal structure to training strategy in network design to ensure that the network does not overfit on the training data, while having good generalization performance, providing higher accuracy and robustness for the final fatigue state classification.

3.2.5 Network structure visualization and model interpretation

In order to understand the entire network structure more intuitively, this paper also visualizes and explains the model. The visualkeras library can be used to generate a hierarchical network structure diagram, which details the type, output size, and number of parameters of each layer. This not only provides intuitive materials for paper writing, but also helps with subsequent model debugging. In the visualization diagram, from the input layer to the convolution layer, pooling layer, and then to the fully connected layer and output layer, the structure and connection relationship of each part are clear at a glance. At the same time, the built-in plot_model function of TensorFlow is also used as an auxiliary to save the network structure diagram, which helps to cross-validate whether the parameter configuration of each layer is accurate. Through these methods, we not only verified the rationality of the model design, but also provided a basis for discovering possible redundant layers or parameter bottlenecks during the debugging process. Combined with actual training logs, loss and accuracy curves, and visualization charts such as confusion matrices, the performance and shortcomings of the model in the fatigue state classification task are further explained, providing specific improvement directions for subsequent network optimization.

Loss Function and Optimization Strategy

In the training process of deep learning models, loss functions and optimization strategies play a crucial guiding role in achieving accurate classification of fatigue and non-fatigue states. For the task of binary classification of sEMG signals in this study, this paper not only considers the matching degree between the output probability and the true label when building the model, but also pays more attention to guiding the model to converge to the ideal state to the maximum extent in actual scenarios with small samples and incompletely balanced data distribution. To this end, a systematic design was carried out in many aspects such as loss function, optimizer, learning rate setting, and class imbalance processing.

3.3.1 Choice of loss function (binary_crossentropy)

This study uses binary cross entropy as the loss function. The basic formula of binary cross entropy can be expressed as:

  L = -[y·log(p) + (1-y)·log(1-p)]

Among them, y represents the true label (0 or 1), and p is the probability of belonging to the positive class (fatigue) predicted by the model. The main reason for using this loss function is that it is highly sensitive to probability output and can impose a large penalty on prediction errors, thereby forcing the model to correct the deviation as soon as possible. This loss function is used in both custom convolutional neural networks and transfer learning models (such as ResNet50, VGG16) to ensure the consistency and scientificity of the training objectives. Practice has shown that through binary cross entropy, the model can more accurately capture the subtle distribution differences of signals in different states, thereby improving classification performance

3.3.2 Optimizer selection and parameter setting (Adam and its learning rate)

In terms of the choice of optimizer, this study uses the Adam optimizer, which is compatible with the advantages of the momentum method and adaptive learning rate adjustment, so that the model can quickly approach the convergence range in the early stage of training, and maintain stable optimization in the later stage of training. The core of the Adam optimizer is to dynamically adjust the learning rate of each parameter, which is particularly critical for processing high-dimensional and complex time-frequency feature images. Specifically, for the custom CNN model, we set a lower learning rate to avoid gradient oscillation, and for the transfer learning model, since the weights of the pre-trained model already have good feature extraction capabilities, only the top layer needs to be fine-tuned, so a relatively high learning rate is selected to accelerate the adaptation process of the new task. Through a large number of experimental comparisons, the Adam optimizer has shown faster convergence speed and better classification indicators than traditional SGD in this task. Code example:

3.3.3 Learning rate and training strategy (fixed/dynamic learning rate, EarlyStopping, etc.)

The learning rate is a key hyperparameter that affects the model training speed and final performance. In the initial stage, this study sets different fixed learning rates according to the model structure and data volume, and then dynamically adjusts them during the training process by monitoring the validation set loss. Although the implementation in the code of this article does not explicitly introduce dynamic callback functions such as ReduceLROnPlateau, in actual operation, we will pay attention to the training curve and manually intervene in the learning rate when necessary to prevent too high a learning rate from causing unstable training or too low a learning rate from causing slow convergence.

3.3.4 Class imbalance processing (automatic calculation and application of class_weight)

In actual collection, although fatigue and non-fatigue samples are roughly balanced, there may be a slight deviation in quantity after segmentation. To this end, this paper adopts a method of automatically calculating category weights, and uses the sklearn tool to determine the weight of each category in the loss according to the distribution of training set labels. In this way, during the training process, the model can automatically increase its attention to the minority class to avoid the situation where the model tends to the majority class due to data skew. After this dynamic adjustment, the experiment shows that the recall rate and F1 score of the model on the minority class have been significantly improved, which fully proves the effectiveness of the strategy.

The specific steps are as follows:

Use `sklearn.utils.class_weight` to automatically calculate weights based on the training set labels.

When training the model, the calculated class weights are passed to the `fit` function through the `class_weight` parameter.

The code is implemented as follows:

3.3.5 Training process monitoring and regularization (validation set monitoring, Dropout, BatchNorm)

In order to further improve the generalization ability of the model and prevent overfitting on the training data, this paper introduces a variety of regularization measures in the network design and training process. , In the fully connected layer, some neurons are randomly discarded through the Dropout layer, which reduces the model's excessive dependence on specific nodes, making the model more robust when facing new samples. , Batch Normalization layers are embedded after each convolutional layer and fully connected layer. This method not only helps to stabilize the input distribution of each layer and accelerate training, but also plays a regularization role to a certain extent. In addition, at the data level, we adopted a data enhancement strategy to expand the training samples through random rotation, translation, scaling, flipping and brightness perturbation. This enables the model to learn robust features on more diverse image inputs and further suppress overfitting. During the training process, in order to ensure that various indicators are consistent between the training set and the validation set, we also plotted the loss and accuracy change curves, and adjusted the training strategy in time by constantly monitoring the data performance.

Add a Dropout layer (such as `Dropout(0.5)`) after the fully connected layer, and randomly discard 50% of the neurons during each training to reduce the dependency between neurons and effectively prevent overfitting.

A BatchNormalization layer is added after the convolutional layer and the fully connected layer to stabilize the input distribution of each layer, speed up the convergence speed, and play a certain regularization role.

Use `ImageDataGenerator` to perform various enhancement operations (rotation, translation, scaling, flipping, brightness perturbation, etc.) on the training set images, which greatly improves the generalization ability of the model.

During the training process, the loss/accuracy curves of the training set and validation set are monitored in real time to prevent overfitting. The code uses `matplotlib` to draw the training and validation loss curves to facilitate observation of whether the model is overfitting or underfitting.

3.3.6 Comprehensive training and optimization process

Combining the above measures, the training and optimization process of this study can be summarized as follows: in the model compilation stage, binary_crossentropy is used as the loss function, Adam optimizer is used to adjust parameter updates, and a suitable initial learning rate is set; the training data is enhanced in real time through ImageDataGenerator, and the automatically calculated category weights are used to balance the category influence in the loss function; again, Dropout, BatchNormalization and other layers are integrated into the model structure to prevent overfitting, and finally the learning rate is adjusted in time with the help of the monitoring callback function and the early stopping strategy is adopted to ensure the stability and order of the training process. Finally, the model achieved an ideal classification effect on the test set, and its accuracy, recall, precision and F1 score and other indicators were at a high level, proving the effectiveness of the entire optimization strategy system.

Model training and evaluation

This section will introduce in detail the whole process of training and evaluating the deep learning-based sEMG muscle fatigue classification model in this study. The content covers the scientific division and loading of the data set, the training process and parameter setting, performance evaluation indicators, training process visualization, and result analysis and discussion. All contents are closely combined with actual code implementation to ensure the combination of theory and practice.

3.4.1 Dataset division and loading

, the division and loading of data sets are the basis of model training. After all preprocessing and time-spectrum image generation, we store the images in the "spectrograms_fatigue" and "spectrograms_nonfatigue" folders according to their categories. Then, using the os and shutil libraries in Python, these images are divided into training sets and test sets in a ratio of 8:2, and 20% of the samples are randomly selected from the training set as the validation set. In the division stage, we strictly ensure that the proportion of each category in different subsets is consistent, which not only ensures the balance of sample distribution, but also provides an objective basis for subsequent model evaluation. During the data loading process, with the help of the flow_from_directory method of ImageDataGenerator, automatic batch loading by directory is realized. In the code, we set the following enhancement parameters: images are normalized to the [0,1] interval when loaded, and uniformly adjusted to a 224×224 RGB three-channel format. At the same time, the training set is subjected to multiple enhancement operations such as rotation, translation, scaling, random flipping, and brightness perturbation, so that each original image generates several times of variants during the training process, maximizing the sample space, while the test set is only slightly processed to ensure that the evaluation results are close to the actual application scenario. The following code shows the implementation process of data enhancement and loading.

Code Implementation

Through this process, the dataset not only achieves stratified sampling, but also comprehensively considers dataset expansion and normalization during the loading process, providing a solid foundation for subsequent training.

3.4.2 Training process and parameter setting

Next, in the model training phase, we systematically trained the custom CNN model and the transfer learning model respectively. The custom CNN model mainly contains multiple layers of convolution, pooling, batch normalization, activation and fully connected layers. At the same time, the Dropout mechanism (such as Dropout(0.5)) is used after the fully connected layer to reduce the risk of overfitting; while the transfer learning model uses the pre-trained convolution layer of ResNet50 or VGG16 as the feature extractor, and only adds the global average pooling, fully connected and Dropout layers to the top layer. After freezing the weights of the backbone network, only the newly added layers are trained. The input size of all models is fixed to 224×224×3, and the output uses Sigmoid activation to achieve binary classification. When compiling the model, we uniformly selected binary_crossentropy as the loss function and used the Adam optimizer to update the parameters. For the custom CNN, the initial learning rate is set low (for example, 0.000001), while the transfer learning only needs to fine-tune the top-level parameters, so the learning rate is appropriately increased (for example, 0.0001). In the code, the model compilation part is as follows:

model.compile(optimizer=Adam(learning_rate=0.000001), loss='binary_crossentropy', metrics=['accuracy'])

history = model.fit(train_ds, epochs=10, validation_data=val_ds, class_weight=class_weights)

Among them, class_weights is automatically calculated by the sklearn tool to balance the contribution of each category. This allows the model to pay more attention to minority class samples when the number of categories is relatively uneven, thereby improving the overall classification performance. In addition, to prevent the model from overfitting during training, we introduced Dropout and BatchNormalization layers in each layer, and used the EarlyStopping strategy and dynamic learning rate adjustment callback function to monitor the training process. Real-time monitoring of the training and validation loss and accuracy curves is an important basis for judging whether the model is overfitting. If the validation set loss suddenly increases in the later stage of training, it may be necessary to stop training in advance or reduce the learning rate to stabilize parameter updates.

3.4.3 Performance Evaluation Indicators

In order to comprehensively evaluate the performance of the model, this study introduced multiple indicators. The most commonly used accuracy directly reflects the proportion of correct classifications of the model; further, we use the confusion_matrix tool to construct a confusion matrix to observe the classification details of the model from four perspectives: true positive, false positive, true negative, and false negative; in addition, through classification_report, the system outputs comprehensive indicators such as precision, recall, and F1 score to comprehensively judge the performance of the model on fatigue and non-fatigue samples. The specific evaluation code is as follows:

(4) Code implementation

Through the above multi-dimensional evaluation, the classification ability and practical application value of the model can be comprehensively and objectively reflected.

3.4.4 Visualization of the training process

In order to intuitively understand the training process, we also use matplotlib to draw the loss curves of the training set and the validation set. The graph shows the loss change trend of each epoch, which can not only confirm whether the training is stable, but also judge whether there is overfitting or underfitting by the difference between the curves. The code example is as follows:

(3) Code implementation

Through the above visualization methods, abnormal situations in the training process can be discovered in time, guiding the optimization adjustment of model structure and parameters.

3.4.5 Results Analysis and Discussion

In the result analysis and discussion, we not only quantitatively evaluated the accuracy, confusion matrix, and classification report of the model on the test set, but also combined the comparative experiments of different models (custom CNN and transfer learning model) to analyze the impact of data enhancement, category weights, and regularization strategies on the overall performance. If the recall rate or precision rate on a certain category is not good, it may be necessary to further optimize feature extraction or adjust the network structure. In addition, through ablation experiments, we can gradually check the contribution of each module to the model performance, which is of great significance for a deep understanding of the internal working mechanism of the model.

Chapter Summary

This chapter systematically describes the entire process of designing and implementing a deep learning-based sEMG signal muscle fatigue state classification model. Through a detailed introduction to key links such as raw signal acquisition, preprocessing, feature extraction, data set construction, model design, training and evaluation, this paper fully demonstrates the innovation and scientificity of this study at the theoretical and practical levels. The following is a summary and induction of the main work of this chapter.

In the data collection and original signal description section, the source, structure and basic information of the sEMG dataset used in this study are clarified. The dataset contains two categories, "fatigue" and "non fatigue", which are stored in different folders. The number of samples in each category is balanced, ensuring the scientificity and representativeness of subsequent experiments. Through code statistics and file name normalization, a solid foundation is laid for subsequent batch processing and automated analysis.

In the signal preprocessing process, in view of the fact that sEMG signals are easily affected by noise and baseline drift, a multi-step processing process including zero baseline correction, absolutization, interval interception, normalization and Butterworth bandpass filtering was designed. Each step is implemented through a custom function, and all samples are processed in batches, which greatly improves the signal-to-noise ratio and comparability of the signal. The preprocessed signal is not only of higher quality, but also provides a reliable data foundation for subsequent feature extraction and modeling.

The short-time Fourier transform (STFT) is used in the time-frequency feature extraction and visualization process to convert the one-dimensional time series signal into a two-dimensional time-frequency spectrum image (spectrogram). All images are of uniform size and the coordinate axes are removed to facilitate direct reading by the deep learning model. Through STFT, the time and frequency characteristics of the signal can be fully displayed, providing rich information for the model to automatically extract complex features. The experiment found that there are obvious differences in energy distribution and spectral structure between the time-frequency images under fatigue and non-fatigue states, which provides a theoretical basis for the effective classification of the model.

In the data set construction and enhancement part, a combination of stratified sampling and data enhancement is used to scientifically divide the training set, validation set, and test set. ImageDataGenerator is used to perform various enhancement operations on the training set images (such as rotation, translation, scaling, flipping, brightness perturbation, etc.), which greatly improves the generalization ability of the model. All images are unified in 224×224×3 RGB format, and the labels are encoded in binary classification to ensure the standardization of the model input data and the reproducibility of the experiment .

In terms of deep learning model design, we built custom convolutional neural networks (CNNs) and transfer learning models (such as ResNet50 and VGG16). The custom CNN structure is concise and efficient, suitable for small and medium-sized data sets, and can effectively extract the spatial features of time-frequency images. The transfer learning model makes full use of the common features learned by pre-trained models on large-scale data sets (such as ImageNet), significantly improving the classification performance in small sample scenarios. All models use the binary_crossentropy loss function and Adam optimizer, combined with regularization measures such as category weights, Dropout, and BatchNormalization, to ensure the efficiency and stability of training.

In the loss function and optimization strategy section, key strategies such as loss function, optimizer, learning rate, category imbalance processing, regularization and training monitoring are elaborated in detail. By reasonably selecting loss function and optimizer, scientifically setting learning rate, automatically calculating category weights, and adopting a variety of regularization measures (such as Dropout, BatchNormalization, data enhancement, etc.), the training efficiency, stability and generalization ability of the model are greatly improved. During the training process, the loss and accuracy of the training set and validation set are monitored in real time, and callback functions such as EarlyStopping are combined to optimize the training process to prevent overfitting.

In the model training and evaluation phase, scientific data set division and loading, reasonable training parameter settings, multi-dimensional performance evaluation indicators (accuracy, confusion matrix, classification report, etc.), intuitive training process visualization, and in-depth results analysis and discussion were used to fully verify the effectiveness and practical application value of the model. By comparing the classification performance of the custom CNN and transfer learning models, and analyzing the impact of different structures on sEMG signal classification, it provides strong theoretical and practical support for subsequent model optimization and actual deployment.

In summary, this chapter not only introduces in detail the design and implementation process of the deep learning-based sEMG signal muscle fatigue state classification model, but also ensures the scientificity and reproducibility of each step through a large number of code implementations and experimental verifications. Through multi-step preprocessing, feature extraction, data enhancement and deep learning modeling, the accuracy and practicality of sEMG signal muscle movement state analysis have been greatly improved. The above processes and methods have laid a solid theoretical and practical foundation for subsequent experimental results analysis, model optimization and practical application promotion.

The work in this chapter provides a complete technical route and theoretical support for the experimental design, result analysis and model optimization in subsequent chapters, and also provides useful reference and reference for the in-depth application of sEMG signals in intelligent rehabilitation, motion monitoring and other fields.

Experimental design and results analysis

This study aims to use deep learning methods to identify fatigue states of surface electromyography (sEMG). Therefore, in the experimental design, a systematic consideration was made from hardware platform, software environment, data preprocessing to model building and evaluation. Overall, this chapter mainly introduces the hardware and software platform environment of the experiment, the comprehensive preparation of the data set, the signal preprocessing and time-frequency image generation process, the data enhancement strategy, and the subsequent model training and performance evaluation. Although each part is independent, it is closely related and together constitutes a complete experimental system, providing sufficient data and theoretical support for the subsequent result analysis.

4.1 Experimental environment and platform

In terms of the experimental platform, the author mainly uses two environments for comparative verification and model debugging. The local experimental platform is configured as a personal computer equipped with an Intel Core i7 processor and 16GB of memory, and the operating system is Windows 10 64-bit. This environment is suitable for data preprocessing and preliminary training and debugging of small-scale models; in terms of the software environment, Python 3.11.11 is the main programming language, and the dependent libraries mainly include TensorFlow 2.x and its Keras interface for building deep networks, scikitlearn for data partitioning and evaluation index calculation, and matplotlib and seaborn for result visualization. Data processing mainly relies on libraries such as pandas and numpy, while the filtering and short-time Fourier transform parts in signal processing use the corresponding functions of the scipy library. In addition, the visualkeras library also plays an important role in visualizing the network structure. Through the organic connection of these tools and libraries, this study was able to debug step by step in the Jupyter Notebook environment, record intermediate results, and intuitively display the changes after processing at each stage.

4.2 Dataset Preparation and Preprocessing

4.2.1 Description of the original dataset

In order to ensure the scientificity and standardization of the experimental data, the data set used in this study comes from the standardized sEMG acquisition experiment and the open source data set . The original data is divided into two categories: fatigue and non-fatigue, which are stored in different folders, and the number of samples in each category is 26. During data acquisition, the signal is single-channel, the sampling frequency is set to 1000Hz, and the main signal indicator recorded is "amplitudo". When the data is counted, the reading and quantity verification of each file sample are realized by programming, so as to ensure that the entire data set is as balanced as possible in terms of categories, providing an objective data basis for the training of subsequent models.

4.2.2 Signal preprocessing process

In the data preprocessing part, the author performed zero baseline correction on the read raw signal to eliminate the DC offset caused by the acquisition equipment and environment. In the specific implementation, the overall mean of the signal was calculated using "np.mean", and the original data was subjected to a difference processing using the mean as the baseline, so that the overall signal fluctuated around zero. Next, in order to ensure the non-negativity of the value and facilitate the subsequent extraction of absolute energy, the signal was processed by taking the absolute value. After the above steps, the noise in the signal still exists, so the filtering process was further designed. The 4th-order Butterworth bandpass filter was used to set the cutoff frequency between 10Hz and 100Hz, which not only eliminates low-frequency drift and high-frequency noise, but also better retains key information during the movement. In the implementation process, the author called the butter and filtfilt functions in scipy.signal, and accurately implemented the filtering of the signal by setting the order of the filter, low and high cutoff points and other parameters. After multiple parameter adjustments and verifications, the preprocessing results that make the signal visually smoother and have high information fidelity were obtained.

4.2.3 Time-frequency image generation

After completing the preprocessing, the focus of the study turned to how to extract time-frequency features that are conducive to the determination of fatigue status. This paper uses short-time Fourier transform (STFT) to perform time-frequency conversion on each segment of the preprocessed signal, and converts the one-dimensional time series into a two-dimensional spectrogram. The specific process is to use the stft function in scipy.signal to set the sampling rate to 1000Hz and the window length to 1000 points to segment the signal and obtain the energy distribution of the frequency in each time window. Subsequently, in order to ensure the consistency of the image feature input into the deep network, the generated time-frequency spectrum is cropped and adjusted to a uniform size of 224×224 pixels. At the same time, when generating the image, the redundant coordinate axes and labels are removed to ensure that the spectrum data is purer and convenient for subsequent modeling. Such a conversion process not only intuitively shows the changes in the signal in the time and frequency domains, but also effectively combines traditional signal analysis with modern image processing methods, providing a reasonable basis for subsequent convolutional neural networks based on image input.

4.2.4 Data Augmentation and Partitioning

Now that the data has been converted into image representation, how to enhance the diversity of the dataset during the training phase becomes a key issue. In order to solve the risk of overfitting caused by the limited number of samples, the author uses the real-time enhancement function of the image data generator (ImageDataGenerator). In the experiment, this method is used to perform multiple transformations such as rotation, translation, scaling, flipping, and brightness perturbation on each original time-frequency image, so that each image can generate about five times the amplified samples during the training process. In the experiment, in order to ensure the consistency of the statistical characteristics of the data in the training, validation and testing phases, a unified size standard (224×224×3 RGB image) and binary classification label encoding (set the fatigue state to 1 and non-fatigue to 0) are adopted. All images are managed through the corresponding directory structure and batch loaded by the flow_from_directory method. This method not only ensures the real-time performance of data enhancement, but also sets the validation set ratio so that a certain proportion of samples in each training are used to monitor the model effect in real time, achieving the purpose of data partitioning and normalization.

4.3 Experimental design

4.3.1 Experimental Grouping

In the experimental design part, in order to evaluate the performance of different models in fatigue classification tasks, the author designed a variety of experimental grouping schemes. A custom convolutional neural network was constructed, whose overall structure is relatively simple, consisting of several convolutional layers, batch normalization layers, activation layers, pooling layers, and fully connected layers. After multiple experiments, the number of network layers and parameter configurations suitable for small and medium sample data sets were determined; at the same time, in order to take advantage of the pre-training advantages of large-scale data sets, classic networks such as VGG16 and ResNet50 were selected for transfer learning, that is, using the pre-trained network as a feature extractor, only training the fully connected part of the top layer, so as to improve the generalization ability of the model under small sample conditions. In the actual implementation process, detailed parameter settings were made for both models, including input size, batch size, number of training rounds, and learning rate. The learning rate used in the custom network is low to ensure that the model can gradually converge during the fine-tuning stage, while a slightly higher learning rate is set in the transfer learning part to quickly adapt to new data. To prevent interference to training due to category imbalance, the author also uses the category weight calculation method in sklearn to automatically obtain the weights of different categories, and passes them to the training function during model training to ensure that the loss function can fully consider the importance of minority class samples when calculating.

4.3.2 Hyperparameter Settings

Input size: 224×224×3

Batch size: 32

Number of training epochs: 10~100

Optimizer: Adam, learning rate custom CNN is 1e6, transfer learning is 1e4

损失函数:binary_crossentropy

Regularization: Add Dropout (0.5) after the fully connected layer, and add BatchNormalization after the convolution and fully connected layers

4.3.3 Dealing with Class Imbalance

Use `sklearn.utils.class_weight` to automatically calculate class weights and balance the contributions of different classes in the loss function.

During model training, the class_weight parameter is passed in to improve the model's ability to recognize minority classes.

4.3.4 Training and Validation Process

In the model training and verification stage, the enhanced image data is dynamically loaded by calling ImageDataGenerator. The model monitors the loss and accuracy changes of the training set and verification set in real time during the training process. Through the early stopping strategy, not only can the training be automatically interrupted when the verification loss continues to not decrease to avoid overfitting, but also the optimal model parameters can be saved. During the training process, the author recorded the loss and accuracy curves of each epoch in detail, and used matplotlib to draw the change curves of the loss function and accuracy, which intuitively showed the gradual convergence of the model from a graphical perspective. In the experiment, the training process of different models showed certain differences. For example, the custom CNN showed a trend of faster convergence in the early stage, and although the transfer learning model converged slightly slower, it showed better stability in the final test results.

4.4 Model Training and Performance Evaluation

In the model evaluation stage, the author uses confusion matrix and classification report (including accuracy, precision, recall and F1 score) to further quantitatively analyze the model classification effect by conducting unified evaluation on the test set. In the test stage, the model.evaluate function is used to obtain the overall accuracy, and then the prediction result of each sample is calculated according to the model.predict, and then the confusion matrix is ​​generated in combination with the actual label. For the convenience of comparative analysis, the heat map of the confusion matrix is ​​drawn using the seaborn library, from which the confusion between categories can be intuitively seen. Each indicator is also discussed in detail in the classification report, and it is found that although most samples can be correctly classified, some marginal samples still have misjudgment when the signal fluctuates greatly. The author has reflected on the data acquisition, signal preprocessing and other links, and believes that in the subsequent work, the filtering and enhancement methods can be further refined, or multiple feature extraction methods can be tried to be integrated to improve the classification robustness of the model for complex signals.

Figure 2 CNN Lost Curve

Figure 3 ResNet50 Loss curve

1 CNN

Summarize

This study focuses on the time-frequency analysis of surface electromyographic signals (sEMG) and deep convolutional neural networks, and explores the problem of real-time identification of muscle fatigue status. Through data preprocessing, time-frequency feature extraction, and CNN-based model design and training, it has initially achieved effective distinction of fatigue status. After multiple experiments and system verifications, this paper has accumulated valuable experience in methodology and implementation details, and also provided a theoretical basis and practical reference for subsequent related research. This chapter will focus on in-depth discussions on research innovations, limitations, and future research directions, striving to present the overall value and shortcomings of this study to readers from both theoretical and practical levels, and explore possible future improvements.

5.1 Summary of research innovations

This study has made certain innovations in many technologies and methods. In terms of data preprocessing, this paper combines zero baseline correction with absolute value processing based on the inherent characteristics of sEMG signals, and uses a 4th-order Butterworth filter for bandpass processing, which not only retains the main characteristics of the signal, but also effectively reduces noise interference. This processing method based on physical principles and signal characteristics provides an accurate signal basis for subsequent time-frequency analysis, and its processing process fully reflects the idea of ​​combining theory with practice. In terms of time-frequency feature extraction, this paper uses short-time Fourier transform (STFT) to convert one-dimensional time domain signals into two-dimensional spectrograms, and uses the spectrograms as inputs to deep learning models, realizing an effective transition from traditional signal processing to data-driven learning. Through this method, not only can the changing trend of the signal in the time and frequency domain be intuitively displayed, but also intuitive and rich information can be provided for the subsequent feature learning of the CNN network.

In terms of model design, this study proposed a thin network structure based on convolutional neural network, which not only reduced the training parameters, but also obtained a higher classification accuracy in a shorter training time. By integrating the traditional CNN module with Batch Normalization, Dropout to prevent overfitting and other strategies, the problem of insufficient model generalization ability caused by the small number of data samples was solved to a certain extent. At the same time, by combining with pre-trained models (such as VGG16 and ResNet50), this study demonstrated the possibility of taking advantage of the big data model in the case of small samples, which has certain reference significance for the field of medical signal analysis. During the research process, the author not only made continuous optimization attempts on the model architecture, but also made multiple iterations on the data set division and training strategy, starting from various details to seek the optimal solution, so that the performance of the entire system on the experimental data is stable and has high robustness. In general, this study has made systematic explorations in data preprocessing, feature extraction, and deep model design, and has verified the organic combination of traditional methods and modern deep learning methods in practice, thus proposing a reference solution with certain practical value in the field of muscle fatigue state recognition.

5.2 Limitation Analysis

Although this study has achieved certain results in methods and implementation, there are still many limitations. In terms of data collection, the number of sEMG samples used in this experiment is relatively limited, and due to the limitations of equipment performance and experimental conditions, the data noise is large and cannot fully reflect the actual situation of different people in different motion states. Due to the limitations of the number of subjects and experimental equipment conditions, the samples may have large individual differences and imbalances, which to a certain extent have an adverse effect on model training and classification effects. In the data preprocessing stage, although the zero baseline correction and filtering methods have improved the signal quality to a certain extent, there are still deficiencies in the extraction of subtle changes and high-frequency information implied in the signal. Especially when the signal noise is high or there are mutations, the traditional filtering strategy may not be able to completely eliminate all noise interference, thereby affecting the stability and accuracy of the subsequent STFT transformation.

In terms of model construction, this study chose a relatively simple convolutional neural network structure. Although it has a fast training speed and a small number of parameters, it limits the deep characterization of complex signal patterns to a certain extent. The current model's recognition effect on some complex samples is still insufficient, and the model is not robust enough for edge cases and interference factors in the samples. At the same time, due to the data-driven limitations of the deep learning model itself, when the amount of training data is insufficient or the sample distribution is uneven, it is easy to cause the model to be underfitting or overfitting. The early stopping strategy and data enhancement method used in this paper have alleviated this situation to a certain extent, but have not completely solved this problem. In addition, the model is sensitive to hyperparameter settings. Parameter adjustments during training often rely on a lot of experiments and experience, and lack automated optimization methods. This has limited the breadth and depth of the system's promotion and application to a certain extent.

From the perspective of the system as a whole, the paper's work reflects more the results of theoretical exploration and preliminary experimental stages, and has not yet formed a complete application system. In practical applications, issues such as real-time performance, stability, and data security still need further research and resolution. At the same time, in the process of combining theory with experiment, there are also some problems such as insufficient description of experimental details, insufficient standardization of process records, and insufficient verification of data repeatability, which will all be areas that need to be improved and perfected in future work.

5.3 Future Research Directions

In view of the above limitations, relevant research can be further improved from multiple angles in the future. First, data collection and sample size expansion are still important tasks that need to be improved. In the future, more samples should be introduced in the experimental design to cover the electromyographic signals of more groups of different ages, genders, exercise habits, etc. At the same time, we can try to introduce multi-channel acquisition devices to increase the dimension and amount of information of the signal, so as to make the model training more sufficient. In view of the data imbalance problem that may exist in the experimental process, more advanced data enhancement technology or generative models can be used in the future to simulate diversified data and solve some scarce sample problems.

In terms of feature extraction, this paper mainly relies on STFT to convert time-frequency features. In the future, we can consider introducing various signal decomposition methods such as wavelet transform and empirical mode decomposition (EMD) to extract richer and more detailed signal features. Through multi-feature fusion and data dimensionality reduction technology, we can reveal the essential characteristics of the signal at a deeper level, so that subsequent classification work can be carried out on the basis of clearer and more effective features. At the same time, strengthening the deep integration of signal processing and deep learning is also an important direction for future research. This not only includes improvements in model structure, but also requires systematic discussions on model training strategies, loss function design, and model adaptive adjustment.

Third, in terms of model design and optimization, future research can try more complex deep network structures, such as multi-scale convolutional networks, graph convolutional networks, and even ensemble learning methods, so as to improve the recognition rate of complex signal patterns. At the same time, in order to address the hyperparameter setting problems of existing models, automatic machine learning (AutoML) technology can be introduced in the future, using automatic parameter adjustment methods such as Bayesian optimization or genetic algorithms to further improve model stability and generalization capabilities. Especially in actual application scenarios, how to reduce model computation and energy consumption while ensuring efficient recognition is also a very challenging research direction.

In addition, the actual deployment of the model and the construction of a real-time monitoring system are also important tasks for future research. Combining hardware equipment with edge computing technology, the model is integrated into a portable monitoring device to achieve real-time prediction and early warning of muscle fatigue status, which not only has important application value in sports training, rehabilitation medicine and other fields, but also provides possibilities for the construction of intelligent health management systems. Future research can pay more attention to the real-time, stability and security of the model, explore how to efficiently implement deep learning algorithms in embedded systems, and continuously optimize system performance in actual use.

In general, although this study has proved the feasibility of muscle fatigue classification based on STFT and CNN models to a certain extent, there is still much room for further discussion and improvement in terms of theoretical depth and application breadth. Future research should not only be based on current achievements, but also continue to absorb emerging technologies and methods, combine traditional signal processing with the advantages of modern deep learning, and form a system solution with higher recognition accuracy and application value. By continuously improving data collection methods, feature extraction technology, and network architecture design, it is expected to achieve greater breakthroughs in the field of muscle fatigue state monitoring, thereby promoting the in-depth integration of sports science and medical health management. It can be foreseen that with the continuous advancement of artificial intelligence technology and the deepening of multidisciplinary cross-disciplinary research, signal classification methods based on deep learning will play a more important role in the future and contribute more to the improvement of human health and quality of life.

The above not only systematically reviews the overall summary of this research work, but also clarifies the existing problems and directions for further exploration in the future. In the follow-up work, we will continue to improve the experimental design, explore more efficient algorithms and model optimization techniques, and hope to make new progress in theoretical innovation and engineering practice.