The full version of the experimental results report .docx

四、实验结果（Experimental Results）
4. Experimental Results

4.1 传统计算机视觉方法：SIFT+SVM
4.1 Traditional computer vision methods: SIFT+SVM

4.1.1 实验设置与参数配置
4.1.1 Experimental settings and parameter configuration

本研究采用经典的SIFT 特征提取 - Bag-of-Words-SVM框架，具体技术路线如下：
In this study, the classical SIFT feature extraction-Bag-of-Words-SVM framework is adopted, and the specific technical routes are as follows:
特征工程：
Feature Engineering:
SIFT 描述符：从灰度图像中提取 128 维局部关键点，通过高斯金字塔和梯度直方图构建尺度、旋转不变的局部特征（Lowe, 2004）。
SIFT descriptors: 128-dimensional local keypoints are extracted from grayscale images to construct scaled, rotationally invariant local features from Gaussian pyramids and gradient histograms (Lowe, 2004).
视觉词汇表：使用 MiniBatch K-Means 聚类算法对训练集所有描述符进行聚类，设定聚类中心数 K=50，生成视觉词汇表。每张图像通过计算描述符与聚类中心的映射频率，生成 50 维归一化直方图特征（L1 归一化后经 StandardScaler 标准化）。
Visual vocabulary: The MiniBatch K-Means clustering algorithm is used to cluster all descriptors in the training set, and the number of clustering centers is set to K=50 to generate a visual vocabulary. Each image generates a 50-dimensional normalized histogram feature (L1 normalized and normalized by StandardScaler) by calculating the mapping frequency of the descriptor to the cluster center.
分类器配置：
Classifier configuration:
SVM 采用 RBF 核函数，通过网格搜索优化超参数（惩罚因子 C=10，核参数 γ=0.1），基于平衡数据集训练。
The SVM uses the RBF kernel function to optimize the hyperparameters (penalty factor C=10 and kernel parameter γ=0.1) through grid search, and is trained on a balanced dataset.
数据集划分：
Dataset Division:
分层随机划分 80% 训练集（9600 张）与 20% 测试集（2400 张），确保每类样本在训练 / 测试集分布一致（每类训练集 640 张，测试集 160 张）。
80% of the training set (9600 photos) and 20% of the test set (2400 photos) were randomly divided into stratified groups to ensure that each type of sample was consistently distributed in the training/test set (640 samples per training set and 160 samples in the test set).

4.1.2 多维度评估结果
4.1.2 Multi-dimensional assessment results

基础性能指标（表 1）：
Basic performance indicators (Table 1):

指标 index	数值 numeric value	说明 illustrate
准确率（Accuracy） Accuracy	63.0%	整体分类正确性 Overall classification correctness
宏平均精确率 Macro average precision	62.3%	各分类 precision 的无偏平均 Unbiased averaging of precision across categories
宏平均召回率 Macro average recall	63.1%	各分类 recall 的无偏平均 Unbiased averaging of recalls across categories
加权 F1-score Weighted F1-score	62.8%	考虑类别样本不平衡的综合指标 A composite indicator that takes into account the imbalance of the category sample

子类表现分析（表 2）：
Subclass performance analysis (Table 2):

类别 category	Precision	Recall	F1-score	典型误判类别（次数） Typical False Positive Categories (Count)
Agriculture	92.3%	95.0%	93.6%	无显著误判 There were no significant miscalculations
Forest	85.4%	84.4%	84.9%	Grassland（15 次）、Desert（4 次） Grassland (15 times), Desert (4 times)
Lake	72.1%	68.8%	70.4%	Beach（21 次）、River（13 次） Beach (21), River (13)
Highway	68.2%	71.3%	69.7%	Airport（14 次）、Port（20 次） Airport (14), Port (20)
River	58.3%	62.0%	60.1%	Beach（18 次）、Lake（15 次） Beach (18 times), Lake (15 times)

4.1.3 核心发现
4.1.3 Core Discovery

优势类别：结构特征明显的 “Parking”（停车场，F1=94.2%）、“Port”（港口，F1=93.5%）等类别因局部关键点差异显著，分类效果突出。
Advantageous categories: "Parking" (parking lot, F1=94.2%) and "Port" (port, F1=93.5%) with obvious structural characteristics have significant differences in local key points, and the classification effect is outstanding.
共性瓶颈：纹理或颜色相似类别（如 “Grassland→Desert” 误判率 19.4%）暴露手工特征对全局语义的表征缺陷，SIFT 无法捕捉 “植被覆盖密度”“地形起伏” 等上下文信息。
Common bottlenecks: Texture or color similarity categories (such as "Grassland→Desert" with a false positive rate of 19.4%) expose the representation defects of manual features on global semantics, and SIFT cannot capture contextual information such as "vegetation cover density" and "terrain undulation".

4.2 深度学习方法：ResNet18 vs. EfficientNet-B0
4.2 Deep Learning Methods: ResNet18 vs. EfficientNet-B0

4.2.1 实验参数对比（表 3）
4.2.1 Comparison of experimental parameters (Table 3)

参数 ResNet18 EfficientNet-B0 说明
Description of the ResNet18EfficientNet-B0 parameter
网络深度 18 层 B0 版本（复合缩放）残差连接 vs. 复合缩放策略
Network Depth: Layer 18, B0 Version (Composite Scaling), Residual Joins vs. Composite Scaling Strategy
输入尺寸 224×224 224×224 统一图像分辨率以公平对比
Enter the dimensions 224×224224×224 to unify the image resolution for a fair comparison
Batch Size 32 8 受限于 EfficientNet-B0 显存占用（RTX 3090 下 Batch Size=8 避免溢出）
Batch Size328 is limited by EfficientNet-B0 memory usage (Batch Size=8 under RTX 3090 to avoid overflow)
训练时长 8 分钟 12 分钟轻量模型 vs. 高效特征提取架构
Training duration: 8 minutes, 12 minutes, lightweight model vs. efficient feature extraction architecture
预训练权重 ImageNet ImageNet 相同预训练源确保公平性
Pre-training weights are the same as ImageNet's pre-training source to ensure fairness

4.2.2 定量结果深度对比
4.2.2 In-depth comparison of quantitative results

整体性能（表 4）：
Overall performance (Table 4):
指标 SIFT+SVM ResNet18 EfficientNet-B0 提升幅度（vs. SVM）
Improvement of SIFT+SVMResNet18EfficientNet-B0 (vs. SVM）
准确率（%） 63.0 88.08 95.04 +32.04% / +50.86%
Accuracy (%)63.088.0895.04+32.04% / +50.86%
加权 F1-score 0.62 0.79 0.95 +53.2% / +62.1%
Weighted F1-score0.620.790.95+53.2% / +62.1%
少数类平均 F1 0.55 0.72 0.89 +61.8% / +36.9%
Minority Average F10.550.720.89+61.8% / +36.9%
动态训练分析（图 1）：
Dynamic training analysis (Figure 1):
ResNet18 在第 5 epoch 后验证准确率稳定在 88%，显示浅层网络的收敛上限；
ResNet18 verifies that the accuracy is stable at 88% after the 5th epoch, showing the upper limit of convergence of the shallow network.
EfficientNet-B0 在第 3 epoch 超越 ResNet18，第 5 epoch 达 95.5%，体现多尺度特征融合的优势。
EfficientNet-B0 surpassed ResNet18 in the 3rd epoch and 95.5% in the 5th epoch, reflecting the advantages of multi-scale feature fusion.

4.2.3 子类鲁棒性对比
4.2.3 Subclass robustness comparison

类别 SIFT+SVM F1 ResNet18 F1 EfficientNet-B0 F1 关键改进因素
Category: SIFT+SVM, F1, ResNet18, F1, EfficientNet-B0, F1, Key Improvement Factors
Lake 70.4% 85.2% 92.3% 捕捉水域边界与周围地物的空间关系（如 “Lake→Beach” 误判从 21 次降至 5 次）
Lake: 70.4%, 85.2%, 92.3%, captures the spatial relationship between the boundary of the water area and the surrounding features (e.g., "Lake→Beach" misjudgment decreased from 21 to 5)
River 60.1% 81.7% 89.5% 学习河流走向的全局结构特征（如弯曲度、流域植被覆盖）
River60.1%81.7%89.5%Learn the global structural characteristics of river course (e.g., curvature, watershed vegetation cover)

五、讨论（Discussion）
5. Discussion

5.1 方法性能差异的本质原因
5.1 The essential reasons for the difference in method performance

1. 特征表示范式的根本区别
1. Fundamental differences in feature representation paradigms
SIFT+SVM：依赖手工设计的局部特征，特征空间维度固定（50 维），无法建模复杂语义。
SIFT+SVM: Relies on hand-designed local features, with a fixed feature space dimension (50 dimensions), and complex semantics cannot be modeled.
深度学习：自动学习层次化特征，支持多尺度结构建模。
Deep learning: Automatically learns hierarchical features and supports multi-scale structural modeling.
2. 数据效率与泛化能力对比
2. Comparison of data efficiency and generalization ability
传统方法性能瓶颈明显，深度学习方法借助预训练显著提升准确率。
The performance bottleneck of traditional methods is obvious, and the accuracy of deep learning methods is significantly improved with the help of pre-training.

5.2 模型设计决策的影响
5.2 The impact of model design decisions

ResNet18 的局限性：F1-score=0.65。
Limitations of ResNet18: F1-score=0.65.
EfficientNet-B0 的优势：复合缩放提取更丰富空间细节，误判率显著降低。
Advantages of EfficientNet-B0: Composite scaling extracts richer spatial details with significantly lower false positive rates.

5.3 跨方法对比的研究启示
5.3 Research implications for cross-method comparison

传统方法适合边缘部署，深度学习需进一步优化轻量化和少样本表现。
Traditional methods are suitable for edge deployment, and deep learning needs to be further optimized for lightweight and few-shot performance.

5.4 评价指标的互补性分析
5.4 Complementarity analysis of evaluation indicators

宏平均 vs. 加权平均反映不同模型对少数类/多数类表现的差异；
The macro average vs. the weighted average reflects the differences in the performance of different models for minority/majority classes;
子类指标揭示模型优化方向，如对小样本类的增强等。
Subclass metrics reveal the direction of model optimization, such as enhancement of small-sample classes.