# AWDE 0617 CHSIMS English 实验报告

日期：2026-06-17  
代码目录：`/root/AWDE/0617-CHSIMS-eng`  
输出目录：`/root/siton-data-531cb60d91bd4013b805b412b0be2176/tlw/store/AWDE/0617-CHSIMS-eng`  
PKL：`/root/siton-data-531cb60d91bd4013b805b412b0be2176/tlw/store/pkl/0617-CHSIMS-eng/chsims_awde_0617_eng_encoder_raw512_fp16.pkl`

## 1. 实验目标

本轮按用户要求在新目录 `0617-CHSIMS-eng` 跑 CHSIMS 数据集，并明确使用英文版文本/解释输入，因为当前模态解释链路也是英文版。方法主体沿用 `0615` Encoder-FRA AWDE 主线，只把数据源切到 CHSIMS English：

```text
Encoder-FRA features + feature_layers=2 + align_layers=2
+ pre-align FD micro gate + Directed EATS
+ floor-bounded prior SMoE + SmoothL1
+ EMA/composite selection
```

本轮先完成 Search1 4 卡搜索，然后根据结果补跑 Search2 4 卡 targeted hyper search。两轮均已完成，NPU 0-3 当前无训练进程。

## 2. 数据构建

数据源：

```text
/root/siton-data-531cb60d91bd4013b805b412b0be2176/tlw/data/CH-SIMS
/root/siton-data-531cb60d91bd4013b805b412b0be2176/tlw/store/CHSIMS-Encoder-FRA
/root/siton-data-531cb60d91bd4013b805b412b0be2176/tlw/store/outputChsims
```

构建输出：

| Item | Value |
| --- | --- |
| Dataset | CHSIMS |
| Language | English text |
| Samples | 2281 |
| Split | train 1368 / valid 456 / test 457 |
| PKL size | 1.588 GiB |
| Missing rows | 0 |
| Missing English text | 0 |
| Missing explanation/final/weights | 0 |

特征维度：

| Modality | Feature directory | Shape per sample | Dtype |
| --- | --- | ---: | --- |
| text | `Baichuan-13B-Base-langeng-FRA-50` | `(50, 5120)` | float16 |
| audio | `chinese-hubert-large-FRA-50` | `(50, 1024)` | float16 |
| vision | `clip-vit-large-patch14-FRA-50` | `(50, 768)` | float16 |

Split shape：

| Split | Text | Audio | Vision |
| --- | ---: | ---: | ---: |
| train | `(1368, 50, 5120)` | `(1368, 50, 1024)` | `(1368, 50, 768)` |
| valid | `(456, 50, 5120)` | `(456, 50, 1024)` | `(456, 50, 768)` |
| test | `(457, 50, 5120)` | `(457, 50, 1024)` | `(457, 50, 768)` |

CHSIMS 标签范围是 `[-1, 1]`，实际为 11 个档位：`-1.0, -0.8, -0.6, -0.4, -0.2, 0.0, 0.2, 0.4, 0.6, 0.8, 1.0`。当前脚本继承 MOSEI 的 `Mult_acc_3/5/7` rounding/clipping 口径；在 CHSIMS 本轮输出上三者数值完全相同，报告中仍保留三个字段以保持 AWDE 历史统计口径完整。

## 3. 运行设置

共同设置：

```text
selection = valid Corr - 0.50 * valid MAE + 0.20 * valid Mult_acc_5
loss = SmoothL1(beta=0.25)
EMA = true, ema_decay=0.997, ema_start_epoch=4
feature_layers=2, align_layers=2
desc_gate_mode=pre_align, desc_alpha=0.10
temporal_align_type=eats, temporal_sigma=0.08
temporal_desc_bias=0.35, temporal_confidence_bias=0.20
prior_strength=2.0, weight_floor=0.1
```

F1 口径已复查为修正后的 `f1_score(y_true, y_pred, average="weighted")`。Search1 使用 `epochs=120, early_stop_patience=25`；Search2 使用 `epochs=80, early_stop_patience=15`。

## 4. Search1 结果

Search1 四个 run 均使用 seed `20261700`，按验证集 composite 选择 EMA checkpoint。

| Run | Best | Done | Source | Composite | Has0_acc_2 | Has0_F1_score | Non0_acc_2 | Non0_F1_score | Mult_acc_3 | Mult_acc_5 | Mult_acc_7 | MAE | Corr | Zero_recall | Zero_precision | Zero_F1 | Zero_pred_rate | Router [T,A,V] |
| --- | ---: | ---: | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | --- |
| `chsims_eng_h128_lr1e4_b12_d12` | 13 | 38 | EMA | 0.80776 | 0.7987 | 0.7933 | 0.8763 | 0.8750 | 0.7002 | 0.7002 | 0.7002 | 0.2898 | 0.7925 | 0.7318 | 0.6065 | 0.6633 | 0.4726 | [0.386454, 0.364212, 0.249335] |
| `chsims_eng_h128_lr1e4_b8_d12` | 8 | 33 | EMA | 0.80330 | 0.8249 | 0.8211 | 0.9046 | 0.9041 | 0.7155 | 0.7155 | 0.7155 | 0.2878 | 0.8137 | 0.7486 | 0.6291 | 0.6837 | 0.4661 | [0.368937, 0.354252, 0.276811] |
| `chsims_eng_h128_lr5e5_b8_d15` | 8 | 33 | EMA | 0.77228 | 0.8118 | 0.8068 | 0.8840 | 0.8824 | 0.6980 | 0.6980 | 0.6980 | 0.3065 | 0.7993 | 0.7095 | 0.6165 | 0.6597 | 0.4508 | [0.376313, 0.356440, 0.267247] |
| `chsims_eng_h160_lr5e5_b8_d15` | 8 | 33 | EMA | 0.76890 | 0.8074 | 0.8032 | 0.8918 | 0.8914 | 0.7133 | 0.7133 | 0.7133 | 0.3041 | 0.7980 | 0.7318 | 0.6268 | 0.6753 | 0.4573 | [0.366587, 0.346516, 0.286897] |

Search1 mean/std，std 为 4-run sample std：

| Metric | Mean | Std |
| --- | ---: | ---: |
| Has0_acc_2 | 0.8107 | 0.0109 |
| Has0_F1_score | 0.8061 | 0.0115 |
| Non0_acc_2 | 0.8892 | 0.0121 |
| Non0_F1_score | 0.8882 | 0.0125 |
| Mult_acc_3 | 0.7067 | 0.0089 |
| Mult_acc_5 | 0.7067 | 0.0089 |
| Mult_acc_7 | 0.7067 | 0.0089 |
| MAE | 0.2970 | 0.0096 |
| Corr | 0.8009 | 0.0090 |
| Zero_recall | 0.7304 | 0.0160 |
| Zero_precision | 0.6197 | 0.0104 |
| Zero_F1 | 0.6705 | 0.0110 |
| Zero_pred_rate | 0.4617 | 0.0096 |

Search1 结论：`h128/lr1e-4/b8/dropout0.12` 是第一轮最强配置，Has0、Non0、Mult、Corr、Zero-F1 全部领先；`b12` 的验证 composite 略高但 test 端全面落后，说明 CHSIMS 上 batch 12 并没有带来更好的泛化。

## 5. Search2 结果

Search2 根据 Search1 结果补跑 4 个 targeted variants：复验 seed、降 hidden、加 dropout、加 router KL。

| Run | Purpose |
| --- | --- |
| `chsims_eng_s2_h128_lr1e4_b8_d12_seed1` | winner 换 seed，检查稳定性 |
| `chsims_eng_s2_h112_lr1e4_b8_d12` | 降低容量，检查连续回归校准 |
| `chsims_eng_s2_h128_lr1e4_b8_d20` | 提高 dropout 到 0.20 |
| `chsims_eng_s2_h128_lr1e4_b8_d12_rkl002` | 加 `route_kl_weight=0.02`，约束路由靠近解释权重先验 |

| Run | Best | Done | Source | Composite | Has0_acc_2 | Has0_F1_score | Non0_acc_2 | Non0_F1_score | Mult_acc_3 | Mult_acc_5 | Mult_acc_7 | MAE | Corr | Zero_recall | Zero_precision | Zero_F1 | Zero_pred_rate | Router [T,A,V] |
| --- | ---: | ---: | --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | --- |
| `chsims_eng_s2_h112_lr1e4_b8_d12` | 16 | 31 | EMA | 0.79522 | 0.8118 | 0.8079 | 0.8892 | 0.8887 | 0.7133 | 0.7133 | 0.7133 | 0.2796 | 0.8089 | 0.7486 | 0.6233 | 0.6802 | 0.4705 | [0.408392, 0.352634, 0.238974] |
| `chsims_eng_s2_h128_lr1e4_b8_d12_rkl002` | 11 | 26 | EMA | 0.81935 | 0.7965 | 0.7918 | 0.8789 | 0.8784 | 0.7068 | 0.7068 | 0.7068 | 0.2836 | 0.8101 | 0.7095 | 0.6195 | 0.6615 | 0.4486 | [0.347068, 0.382365, 0.270567] |
| `chsims_eng_s2_h128_lr1e4_b8_d12_seed1` | 11 | 26 | EMA | 0.77564 | 0.8074 | 0.8040 | 0.8840 | 0.8839 | 0.7024 | 0.7024 | 0.7024 | 0.3000 | 0.7889 | 0.7318 | 0.6179 | 0.6701 | 0.4639 | [0.428329, 0.311010, 0.260661] |
| `chsims_eng_s2_h128_lr1e4_b8_d20` | 8 | 23 | EMA | 0.79476 | 0.8053 | 0.7999 | 0.8892 | 0.8881 | 0.7024 | 0.7024 | 0.7024 | 0.3006 | 0.7918 | 0.7542 | 0.6109 | 0.6750 | 0.4836 | [0.374855, 0.345430, 0.279715] |

Search2 mean/std，std 为 4-run sample std：

| Metric | Mean | Std |
| --- | ---: | ---: |
| Has0_acc_2 | 0.8053 | 0.0064 |
| Has0_F1_score | 0.8009 | 0.0069 |
| Non0_acc_2 | 0.8853 | 0.0049 |
| Non0_F1_score | 0.8848 | 0.0048 |
| Mult_acc_3 | 0.7062 | 0.0052 |
| Mult_acc_5 | 0.7062 | 0.0052 |
| Mult_acc_7 | 0.7062 | 0.0052 |
| MAE | 0.2909 | 0.0109 |
| Corr | 0.7999 | 0.0111 |
| Zero_recall | 0.7360 | 0.0201 |
| Zero_precision | 0.6179 | 0.0052 |
| Zero_F1 | 0.6717 | 0.0080 |
| Zero_pred_rate | 0.4667 | 0.0146 |

Search2 结论：Search2 没有超过 Search1 winner 的分类/离散/相关性综合表现，但把平均 MAE 从 0.2970 降到 0.2909。`h112/lr1e-4/b8/dropout0.12` 是 MAE 最强配置，说明降容量能改善连续值校准；但它的 Has0、Non0、Corr 仍低于 Search1 winner。

## 6. 全部 8 run 汇总

All-8 mean/std，std 为 8-run sample std：

| Metric | Mean | Std |
| --- | ---: | ---: |
| Has0_acc_2 | 0.8080 | 0.0088 |
| Has0_F1_score | 0.8035 | 0.0092 |
| Non0_acc_2 | 0.8872 | 0.0088 |
| Non0_F1_score | 0.8865 | 0.0090 |
| Mult_acc_3 | 0.7065 | 0.0068 |
| Mult_acc_5 | 0.7065 | 0.0068 |
| Mult_acc_7 | 0.7065 | 0.0068 |
| MAE | 0.2940 | 0.0101 |
| Corr | 0.8004 | 0.0094 |
| Zero_recall | 0.7332 | 0.0171 |
| Zero_precision | 0.6188 | 0.0077 |
| Zero_F1 | 0.6711 | 0.0089 |
| Zero_pred_rate | 0.4642 | 0.0117 |

Best-by-test-metric：

| Metric | Best run | Value |
| --- | --- | ---: |
| Has0_acc_2 | `chsims_eng_h128_lr1e4_b8_d12` | 0.8249 |
| Has0_F1_score | `chsims_eng_h128_lr1e4_b8_d12` | 0.8211 |
| Non0_acc_2 | `chsims_eng_h128_lr1e4_b8_d12` | 0.9046 |
| Non0_F1_score | `chsims_eng_h128_lr1e4_b8_d12` | 0.9041 |
| Mult_acc_3 | `chsims_eng_h128_lr1e4_b8_d12` | 0.7155 |
| Mult_acc_5 | `chsims_eng_h128_lr1e4_b8_d12` | 0.7155 |
| Mult_acc_7 | `chsims_eng_h128_lr1e4_b8_d12` | 0.7155 |
| MAE | `chsims_eng_s2_h112_lr1e4_b8_d12` | 0.2796 |
| Corr | `chsims_eng_h128_lr1e4_b8_d12` | 0.8137 |
| Zero_recall | `chsims_eng_s2_h128_lr1e4_b8_d20` | 0.7542 |
| Zero_precision | `chsims_eng_h128_lr1e4_b8_d12` | 0.6291 |
| Zero_F1 | `chsims_eng_h128_lr1e4_b8_d12` | 0.6837 |
| Zero_pred_rate | `chsims_eng_s2_h128_lr1e4_b8_d20` | 0.4836 |

## 7. 验证集选择点

下表记录每个 run 被 composite 选中的 checkpoint 在 valid 端的主要指标，便于判断 valid/test 是否一致。

| Run | Valid Has0 | Valid Has0-F1 | Valid Non0 | Valid Non0-F1 | Valid Mult3 | Valid Mult5 | Valid Mult7 | Valid MAE | Valid Corr | Valid Zero-F1 |
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| `chsims_eng_h128_lr1e4_b12_d12` | 0.8048 | 0.8011 | 0.8605 | 0.8592 | 0.7303 | 0.7303 | 0.7303 | 0.2934 | 0.8084 | 0.6952 |
| `chsims_eng_h128_lr1e4_b8_d12` | 0.7851 | 0.7808 | 0.8398 | 0.8384 | 0.7390 | 0.7390 | 0.7390 | 0.2966 | 0.8038 | 0.7071 |
| `chsims_eng_h128_lr5e5_b8_d15` | 0.7851 | 0.7826 | 0.8346 | 0.8349 | 0.7434 | 0.7434 | 0.7434 | 0.3184 | 0.7828 | 0.7139 |
| `chsims_eng_h160_lr5e5_b8_d15` | 0.7961 | 0.7929 | 0.8398 | 0.8387 | 0.7215 | 0.7215 | 0.7215 | 0.3230 | 0.7861 | 0.6990 |
| `chsims_eng_s2_h112_lr1e4_b8_d12` | 0.8026 | 0.7997 | 0.8450 | 0.8439 | 0.7281 | 0.7281 | 0.7281 | 0.3002 | 0.7997 | 0.6954 |
| `chsims_eng_s2_h128_lr1e4_b8_d12_rkl002` | 0.8070 | 0.8021 | 0.8605 | 0.8578 | 0.7500 | 0.7500 | 0.7500 | 0.2837 | 0.8112 | 0.7154 |
| `chsims_eng_s2_h128_lr1e4_b8_d12_seed1` | 0.7895 | 0.7853 | 0.8475 | 0.8464 | 0.7237 | 0.7237 | 0.7237 | 0.3106 | 0.7862 | 0.6872 |
| `chsims_eng_s2_h128_lr1e4_b8_d20` | 0.8070 | 0.8032 | 0.8630 | 0.8617 | 0.7478 | 0.7478 | 0.7478 | 0.3048 | 0.7976 | 0.7194 |

验证集和测试集存在明显错位：`rkl002` 的 valid composite 最高，valid MAE/Corr/Mult/Zero-F1 也都很强，但 test Has0、Non0、Zero-F1 没有跟上。CHSIMS split 较小，单次 valid composite 容易偏向某些局部校准点；因此最终建议不能只看 valid composite 排序。

## 8. 深度分析

1. 最强综合配置是 `chsims_eng_h128_lr1e4_b8_d12`。它在 test 端拿到 `Has0=0.8249`、`Has0-F1=0.8211`、`Non0=0.9046`、`Non0-F1=0.9041`、`Mult-5=0.7155`、`Corr=0.8137`、`Zero-F1=0.6837`，除 MAE 和 Zero recall 外基本全项第一。

2. 最低 MAE 是 `chsims_eng_s2_h112_lr1e4_b8_d12`，`MAE=0.2796`，同时 `Corr=0.8089` 也接近 winner。这说明 CHSIMS English 上较小 hidden bottleneck 有助于连续强度回归，但会牺牲一点二分类和 Non0 判别。

3. `route_kl_weight=0.02` 明显提高了 valid composite：`0.81935` 是全部 8 个 run 最高；但 test `Has0=0.7965`、`Zero-F1=0.6615` 偏低。这个现象更像是 router 被解释先验约束后对 valid 过拟合，而不是稳定泛化提升。CHSIMS 的解释权重先验可以保留，但当前 `0.02` 对主结果不推荐。

4. `dropout=0.20` 对 Zero recall 有帮助，`Zero_recall=0.7542` 是全场最高，`Zero_pred_rate=0.4836` 也最高；但它没有提升 Has0、Non0、Corr，说明更强 dropout 主要让模型更愿意预测 neutral/zero，而不是整体情感强度更准。

5. winner 换 seed 后下降明显：`seed1` 的 Corr 从 `0.8137` 降到 `0.7889`，MAE 从 `0.2878` 升到 `0.3000`。CHSIMS 样本量只有 2281，seed sensitivity 比 MOSEI 更强；如果要做论文表，建议至少对 `h128/lr1e-4/b8/d0.12` 和 `h112/lr1e-4/b8/d0.12` 做 4 seed 复验。

6. Router 权重没有坍缩。全部强配置的 text/audio/vision 大致落在 text `0.35-0.43`、audio `0.31-0.38`、vision `0.24-0.29`。和 MOSEI 0615 相比，CHSIMS 的 audio 权重更接近 text，说明英文文本特征虽强，但 CHSIMS 的语音线索仍在路由里有稳定贡献。

## 9. 结论与下一步建议

正式推荐主配置：

```text
run = chsims_eng_h128_lr1e4_b8_d12
hidden_dim = 128
lr = 1e-4
batch_size = 8
dropout = 0.12
selection = EMA + valid composite
```

主结果：

| Run | Has0_acc_2 | Has0_F1_score | Non0_acc_2 | Non0_F1_score | Mult_acc_3 | Mult_acc_5 | Mult_acc_7 | MAE | Corr | Zero_F1 |
| --- | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: | ---: |
| `chsims_eng_h128_lr1e4_b8_d12` | 0.8249 | 0.8211 | 0.9046 | 0.9041 | 0.7155 | 0.7155 | 0.7155 | 0.2878 | 0.8137 | 0.6837 |

若目标优先级是连续值误差，则可以报告或补验：

```text
run = chsims_eng_s2_h112_lr1e4_b8_d12
MAE = 0.2796
Corr = 0.8089
```

下一轮建议：

- 论文/稳定口径：对 `h128/lr1e-4/b8/d0.12` 跑 4 seed，主看 Has0/Non0/Corr/Zero-F1 的均值和方差。
- MAE 口径：对 `h112/lr1e-4/b8/d0.12` 跑 4 seed，确认低容量的 MAE 优势是否稳定。
- 不建议继续扩大 dropout 或当前强度的 router KL；二者都没有超过主 winner。
- 如果要提升 CHSIMS 的 `Mult_acc_3/5/7` 区分度，需要单独重审多分类 metric 在 `[-1,1]` 标签空间下的 rounding/clipping 设计，否则三列会继续完全相同。

## 10. 复查路径

核心结果文件：

```text
/root/siton-data-531cb60d91bd4013b805b412b0be2176/tlw/store/AWDE/0617-CHSIMS-eng/*/status.json
/root/siton-data-531cb60d91bd4013b805b412b0be2176/tlw/store/AWDE/0617-CHSIMS-eng/*/AWDE_0617_CHSIMS_ENG_EXPERIMENT.md
/root/siton-data-531cb60d91bd4013b805b412b0be2176/tlw/store/AWDE/0617-CHSIMS-eng/*/awde_0617_chsims_eng_results.txt
```

启动脚本：

```text
/root/AWDE/0617-CHSIMS-eng/scripts/start_0617_chsims_eng_npu4.sh
/root/AWDE/0617-CHSIMS-eng/scripts/start_0617_chsims_eng_search2_npu4.sh
```

数据构建统计：

```text
/root/siton-data-531cb60d91bd4013b805b412b0be2176/tlw/store/AWDE/0617-CHSIMS-eng/chsims_awde_0617_eng_build_stats.json
```
