目錄

1150317 meeting

收尾工作 v2.0

本次實驗原先欲複測 Weather2K 、 NASA GES DISC MERRA-2 在 與上次實驗相同資料範圍下,但不進行時間重塑的實驗結果。但因使用國網中心時,發現已超越單一 GPU 之 vRAM 限制(過大),故本次將對所有實驗進行參數調整,縮小 Residual 層數並重測所有實驗,已達可用之實驗組與對照組。

本次實驗順帶重構以往所有用於實驗之程式碼(SSSDS4 + AFRK 程式碼未更動),同時將兩資料集實驗範圍統一在半年區間,時序預測設為其 10 % ,約 18 天;未觀測地點為整體地點之 20 % 。

在研究為何原 SSSD 程式碼無法直接於多 GPU 上執行運算時,發現其中多步接直接指定計算裝置為 CDUA ,即

1
.device("cuda")

此舉造成使用 SSSD 程式碼時,被限定在僅可使用 NVIDIA GPU ,從而無法使用 CPU 、 MPS 、 TPU 等裝置運行,且硬性編碼 .device("cuda") 會造成只能於第一個 GPU ,即 "cuda:0" 上運行,使多 GPU 間無法進行資料傳遞。因整體進行改寫曠日廢時,故目前暫無打算進行改寫,僅以同上所述,縮減部分模型層數以進行實驗。

實驗介紹

本次進行以下實驗,詳細實驗設定參考下節。

控制變因實驗 1實驗 2實驗 3實驗 4
迭代次數4,0004,0004,0004,000
訓練策略$SSSD^{S4 + AFRK}$$SSSD^{S4}$$SSSD^{S4 + AFRK}$$SSSD^{S4}$
時間重塑falsefalsetruetrue
併入時間步 $p$8 / 248 / 24
Input Channels4 / 14 / 132 / 2432 / 24
S4 Max Seq. Length1,176 / 4,6081,176 / 4,608147 / 192147 / 192
Missing $k$88 / 14488 / 14411 / 611 / 6

實驗命名依上表不同可區分如下:

  • 實驗 1
    • Weather2k-1var-S4+AFRK
    • MERRA2-1var-S4+AFRK
  • 實驗 2
    • Weather2k-1var-S4
    • MERRA2-1var-S4
  • 實驗 3
    • Weather2k-S4+AFRK
    • MERRA2-S4+AFRK
  • 實驗 4
    • Weather2k-S4
    • MERRA2-S4

實驗設定

本次實驗使用設定如下所示,僅輸入出通道(channels)、時間序列長度(s4 max sequence length)、缺失值(missing $k$)、是否啟用 AFRK(enable spatial training)及目錄與路徑會依實驗內容不同而有更動。以下呈現 Weather2k-S4+AFRK 設定。

model.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
wavenet:
  # WaveNet model parameters
  input_channels:  32  # Number of input channels
  output_channels: 32  # Number of output channels
  residual_layers: 20  # Number of residual layers
  residual_channels: 20  # Number of channels in residual blocks
  skip_channels: 20  # Number of channels in skip connections

  # Diffusion step embedding dimensions
  diffusion_step_embed_dim_input:  64  # Input dimension
  diffusion_step_embed_dim_hidden: 128  # Middle dimension
  diffusion_step_embed_dim_output: 128  # Output dimension

  # Structured State Spaces sequence model (S4) configurations
  s4_max_sequence_length: 166  # Maximum sequence length
  s4_state_dim: 128  # State dimension
  s4_dropout: 0.2  # Dropout rate
  s4_bidirectional: true  # Whether to use bidirectional layers
  s4_use_layer_norm: true  # Whether to use layer normalization

diffusion:
  # Diffusion model parameters
  T: 100  # Number of diffusion steps
  beta_0: 0.0001  # Initial beta value
  beta_T: 0.01  # Final beta value

AFRK:
# AutoFRK model parameters, for more details, please refer to https://pypi.org/project/autoFRK/
  method: "fast"  # autoFRK method to use (e.g., "fast")
  tps_method: "rectangular"  # autoFRK's TPS method to use (e.g., "rectangular")

training.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Training configuration
batch_size: 100  # Batch size
output_directory: "/home/u6025091/SSSD_CP/results/Weather2k-S4+AFRK"  # Output directory for checkpoints and logs
ckpt_iter: "max"  # Checkpoint mode (max or min)
iters_per_ckpt: 500  # Checkpoint frequency (number of epochs)
iters_per_logging: 200  # Log frequency (number of iterations)
n_iters: 4000  # Maximum number of iterations
learning_rate: 0.0005  # Learning rate

# Additional training settings
only_generate_missing: true  # Generate missing values only
use_model: 2  # Model to use for training
masking: "forecast"  # Masking strategy for missing values
missing_k: 18  # Number of missing values

# Data paths
data:
  train_path: "/home/u6025091/SSSD_CP/datasets/Weather2k/data_train_known_real.npy"  # Path to training data

# autoFRK config
enable_spatial_training: true  # Enable spatial training step
location_path: "/home/u6025091/SSSD_CP/datasets/Weather2k/stations_known_locations.npy"  # Path to known locations

inference.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
# Inference configuration
batch_size: 100  # Batch size for inference
output_directory: "/home/u6025091/SSSD_CP/results/Weather2k-S4+AFRK/inference"  # Output directory for inference results
ckpt_path: "/home/u6025091/SSSD_CP/results/Weather2k-S4+AFRK"  # Path to checkpoint for inference
trials: 1 # Replications

# Additional training settings
only_generate_missing: true  # Generate missing values only
use_model: 2  # Model to use for training
masking: "forecast"  # Masking strategy for missing values  # inference mode need to fix in the code, or using other masking strategy will be failed
missing_k: 18  # Number of missing values

# Data paths
data:
  test_path: "/home/u6025091/SSSD_CP/datasets/Weather2k/data_train_known_real.npy"  # Path to test data

# autoFRK config
enable_spatial_inference: true  # Enable spatial prediction step
enable_spatial_normalization: true  # Enable spatial normalization before and after autoFRK
known_location_path: "/home/u6025091/SSSD_CP/datasets/Weather2k/stations_known_locations.npy"  # Path to known locations
unknown_location_path: "/home/u6025091/SSSD_CP/datasets/Weather2k/stations_unknown_locations.npy"  # Path to unknown locations

選取的實驗地點如下所示。

https://raw.githubusercontent.com/Josh-test-lab/website-assets-repository/refs/heads/main/posts/1150317%20meeting/Weather2k.png
Weather2k 地點分布。
https://raw.githubusercontent.com/Josh-test-lab/website-assets-repository/refs/heads/main/posts/1150317%20meeting/MERRA2.png
MERRA-2 地點分布。

下節呈現各實驗結果。

實驗 1

Weather2k-1var-S4+AFRK

訓練花費時間:35h 25m 14.82s(註:此實驗非使用單一機器,而是與 Weather2k-1var-S4 、 Weather2k-S4 、 Weather2k-S4+AFRK 共用機器進行訓練)
推論階段 AFRK 之 MRTS 基底數:382

MetricALL Locs & All TimeKnown Locs & All TimeUnknown Locs & All TimeALL Locs & FutureKnown Locs & FutureUnknown Locs & FutureALL Locs & PastKnown Locs & PastUnknown Locs & Past
MSPE1.494639e+011.160173e+012.828929e+017.756372e+017.868970e+017.307184e+018.156561e+004.327126e+002.343335e+01
RMSPE3.866056e+003.406131e+005.318767e+008.807027e+008.870722e+008.548207e+002.855969e+002.080175e+004.840800e+00
MSPE%1.380822e+103.299012e+095.573267e+102.246783e+102.608424e+108.040864e+091.286923e+108.283242e+086.090407e+10
RMSPE%1.175084e+055.743703e+042.360777e+051.498927e+051.615062e+058.967086e+041.134426e+052.878062e+042.467875e+05
MAPE2.332151e+002.120980e+003.174576e+005.441194e+005.450313e+005.404813e+001.995026e+001.759968e+002.932743e+00
MAPE%7.822800e+085.627048e+081.658233e+095.047446e+085.480241e+083.320892e+088.123742e+085.642966e+081.802031e+09

MERRA2-1var-S4+AFRK

訓練花費時間:20h 24m 12.59s
推論階段 AFRK 之 MRTS 基底數:301

MetricALL Locs & All TimeKnown Locs & All TimeUnknown Locs & All TimeALL Locs & FutureKnown Locs & FutureUnknown Locs & FutureALL Locs & PastKnown Locs & PastUnknown Locs & Past
MSPE160.204191160.390768159.4578831628.6478421631.7856121616.0967650.9753610.8419291.509089
RMSPE12.65717912.66454812.62766340.35650940.39536640.2007060.9876040.9175671.228450
MSPE%0.6115710.6126220.6073676.2224526.2373166.1629930.0031620.0027150.004950
RMSPE%0.7820300.7827020.7793382.4944842.4974622.4825380.0562330.0521080.070357
MAPE3.8753493.8014594.17091034.34157734.36992934.2281720.5717820.4868050.911689
MAPE%0.0142260.0139810.0152060.1281760.1283510.1274770.0018690.0015790.003032

實驗 2

Weather2k-1var-S4

訓練花費時間:10h 3m 41.84s(註:此實驗非使用單一機器,而是與 Weather2k-1var-S4+AFRK 共用機器進行訓練)
推論階段 AFRK 之 MRTS 基底數:382

MetricALL Locs & All TimeKnown Locs & All TimeUnknown Locs & All TimeALL Locs & FutureKnown Locs & FutureUnknown Locs & FutureALL Locs & PastKnown Locs & PastUnknown Locs & Past
MSPE1.160546e+018.335466e+002.465048e+018.308004e+018.508390e+017.508604e+013.855207e+001.334625e-021.918156e+01
RMSPE3.406679e+002.887121e+004.964925e+009.114825e+009.224093e+008.665220e+001.963468e+001.155260e-014.379676e+00
MSPE%1.214263e+102.615445e+095.014950e+102.295537e+102.672051e+107.935082e+091.097017e+101.642926e+065.472697e+10
RMSPE%1.101936e+055.114142e+042.239408e+051.515103e+051.634641e+058.907908e+041.047386e+051.281767e+032.339380e+05
MAPE1.034689e+006.119608e-012.721080e+005.516763e+005.541266e+005.419014e+005.486807e-017.745780e-022.428533e+00
MAPE%3.535854e+087.255939e+071.474684e+094.961357e+085.373860e+083.315754e+083.381282e+082.215650e+071.598635e+09

MERRA2-1var-S4

訓練花費時間:16h 48m 57.28s
推論階段 AFRK 之 MRTS 基底數:301

MetricALL Locs & All TimeKnown Locs & All TimeUnknown Locs & All TimeALL Locs & FutureKnown Locs & FutureUnknown Locs & FutureALL Locs & PastKnown Locs & PastUnknown Locs & Past
MSPE153.309263153.324779153.2471981564.3569851566.5607301555.5420070.3040880.0823271.191135
RMSPE12.38181212.38243812.37930539.55195339.57980239.4403600.5514420.2869261.091391
MSPE%0.5871300.5876180.5851785.9924956.0042525.9454700.0010060.0002720.003941
RMSPE%0.7662440.7665630.7649692.4479572.4503572.4383330.0317210.0165050.062781
MAPE3.6512323.5509184.05248933.98024633.99754233.9110600.3625440.2494770.814813
MAPE%0.0135230.0131960.0148340.1270420.1271820.1264800.0012140.0008360.002728

實驗 3

Weather2k-S4+AFRK

訓練花費時間:(註:此實驗非使用單一機器,而是與 Weather2k-1var-S4+AFRK 共用機器進行訓練)
推論階段 AFRK 之 MRTS 基底數:

MERRA2-S4+AFRK

訓練花費時間:
推論階段 AFRK 之 MRTS 基底數:

實驗 4

Weather2k-S4

訓練花費時間:4h 31m 17.40s(註:此實驗非使用單一機器,而是與 Weather2k-1var-S4+AFRK 共用機器進行訓練)
推論階段 AFRK 之 MRTS 基底數:178

MetricALL Locs & All TimeKnown Locs & All TimeUnknown Locs & All TimeALL Locs & FutureKnown Locs & FutureUnknown Locs & FutureALL Locs & PastKnown Locs & PastUnknown Locs & Past
MSPE2.729977e+012.688188e+012.896683e+012.397620e+022.747339e+021.002486e+024.261693e+006.368540e-032.123748e+01
RMSPE5.224918e+005.184774e+005.382084e+001.548425e+011.657510e+011.001242e+012.064387e+007.980313e-024.608414e+00
MSPE%1.143239e+102.267235e+094.799498e+102.030899e+102.313886e+109.019811e+091.046987e+104.046914e+065.222121e+10
RMSPE%1.069224e+054.761549e+042.190776e+051.425096e+051.521146e+059.497268e+041.023224e+052.011694e+032.285196e+05
MAPE1.437579e+001.047657e+002.993092e+009.437545e+001.014651e+016.609263e+005.701122e-016.103477e-022.600977e+00
MAPE%3.696454e+087.675813e+071.538062e+094.665294e+084.950431e+083.527794e+083.591399e+083.140192e+071.666586e+09

MERRA2-S4

訓練花費時間:2h 14m 26.41s
推論階段 AFRK 之 MRTS 基底數:111

MetricALL Locs & All TimeKnown Locs & All TimeUnknown Locs & All TimeALL Locs & FutureKnown Locs & FutureUnknown Locs & FutureALL Locs & PastKnown Locs & PastUnknown Locs & Past
MSPE183.060926194.459146137.4680451864.6188661987.3689281373.6186190.7233180.0472423.427622
RMSPE13.53000113.94486111.72467743.18123344.57991637.0623610.8504810.2173521.851384
MSPE%0.6809730.7223990.5152696.9390797.3830595.1631580.0023830.0001590.011281
RMSPE%0.8252110.8499410.7178222.6342132.7171792.2722580.0488190.0126040.106213
MAPE3.8216583.6983504.31488935.22039536.22056431.2197230.4169760.1718451.397498
MAPE%0.0138820.0134860.0154660.1290190.1325070.1150690.0013970.0005800.004666

Bonus

前述實驗皆以半年時長作為實驗時間序列,但似乎下修模型 Residual 層數使得模型學習到的側爭效果成效不彰,尤其以 MERRA-2 實驗尤為糟糕。故於本次實驗間額外做兩個實驗,同樣以 MERRA-2 資料集作為測試資料(因為該資料集較大,同樣時間範圍內容易達到 vRAM 上限,以下實驗訓練過程中 vRAM 皆佔有 30 GB 以上),分別測試在降低輸入 batch_size 的情況下,下列實驗參數的表現:

實驗 5 - MERRA2-1var-S4+AFRK-low-batch

本實驗的訓練集與 MERRA2-1var-S4+AFRK 相同,但降低部分參數,同時提升模型層數到與前次 meeting 接近的參數設定,因 vRAM 關係無法完全一致,具體參數如下:

model.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
wavenet:
  # WaveNet model parameters
  input_channels:  1  # Number of input channels
  output_channels: 1  # Number of output channels
  residual_layers: 20  # Number of residual layers
  residual_channels: 36  # Number of channels in residual blocks
  skip_channels: 36  # Number of channels in skip connections

  # Diffusion step embedding dimensions
  diffusion_step_embed_dim_input:  128  # Input dimension
  diffusion_step_embed_dim_hidden: 256  # Middle dimension
  diffusion_step_embed_dim_output: 256  # Output dimension

  # Structured State Spaces sequence model (S4) configurations
  s4_max_sequence_length: 3984  # Maximum sequence length
  s4_state_dim: 128  # State dimension
  s4_dropout: 0.2  # Dropout rate
  s4_bidirectional: true  # Whether to use bidirectional layers
  s4_use_layer_norm: true  # Whether to use layer normalization

diffusion:
  # Diffusion model parameters
  T: 100  # Number of diffusion steps
  beta_0: 0.0001  # Initial beta value
  beta_T: 0.01  # Final beta value

AFRK:
# AutoFRK model parameters, for more details, please refer to https://pypi.org/project/autoFRK/
  method: "fast"  # autoFRK method to use (e.g., "fast")
  tps_method: "rectangular"  # autoFRK's TPS method to use (e.g., "rectangular")

training.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Training configuration
batch_size: 60  # Batch size
output_directory: "/home/u6025091/SSSD_CP/results/MERRA2-1var-S4+AFRK-lowbatch"  # Output directory for checkpoints and logs
ckpt_iter: "max"  # Checkpoint mode (max or min)
iters_per_ckpt: 500  # Checkpoint frequency (number of epochs)
iters_per_logging: 200  # Log frequency (number of iterations)
n_iters: 4000  # Maximum number of iterations
learning_rate: 0.0005  # Learning rate

# Additional training settings
only_generate_missing: true  # Generate missing values only
use_model: 2  # Model to use for training
masking: "forecast"  # Masking strategy for missing values
missing_k: 432  # Number of missing values

# Data paths
data:
  train_path: "/home/u6025091/SSSD_CP/datasets/MERRA2-1var/data_train_known_real.npy"  # Path to training data

# autoFRK config
enable_spatial_training: true  # Enable spatial training step
location_path: "/home/u6025091/SSSD_CP/datasets/MERRA2-1var/stations_known_locations.npy"  # Path to known locations

訓練花費時間:35h 11m 12.33s
推論階段 AFRK 之 MRTS 基底數:301

MetricALL Locs & All TimeKnown Locs & All TimeUnknown Locs & All TimeALL Locs & FutureKnown Locs & FutureUnknown Locs & FutureALL Locs & PastKnown Locs & PastUnknown Locs & Past
MSPE164.784290164.888406164.3678241668.9712151671.2747991659.7568801.6796841.5453042.217204
RMSPE12.83683312.84088812.82060240.85304440.88122840.7401141.2960261.2431021.489028
MSPE%0.6324490.6332650.6291846.4137906.4262026.3641420.0055570.0051160.007321
RMSPE%0.7952670.7957800.7932112.5325462.5349952.5227250.0745420.0715230.085561
MAPE4.1931864.1364414.42016334.90739434.91851434.8629170.8627290.7986261.119142
MAPE%0.0153690.0151850.0161030.1306910.1308130.1302040.0028640.0026470.003731

實驗 6 - MERRA2-1var-S4+AFRK-low-day

本實驗同樣降低部分參數,但提升模型層數到與前次 meeting 更接近的參數設定,且減小訓練集的時間序列至 3 個月,具體參數如下:

model.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
wavenet:
  # WaveNet model parameters
  input_channels:  1  # Number of input channels
  output_channels: 1  # Number of output channels
  residual_layers: 28  # Number of residual layers
  residual_channels: 40  # Number of channels in residual blocks
  skip_channels: 40  # Number of channels in skip connections

  # Diffusion step embedding dimensions
  diffusion_step_embed_dim_input:  128  # Input dimension
  diffusion_step_embed_dim_hidden: 256  # Middle dimension
  diffusion_step_embed_dim_output: 256  # Output dimension

  # Structured State Spaces sequence model (S4) configurations
  s4_max_sequence_length: 1992  # Maximum sequence length
  s4_state_dim: 128  # State dimension
  s4_dropout: 0.2  # Dropout rate
  s4_bidirectional: true  # Whether to use bidirectional layers
  s4_use_layer_norm: true  # Whether to use layer normalization

diffusion:
  # Diffusion model parameters
  T: 100  # Number of diffusion steps
  beta_0: 0.0001  # Initial beta value
  beta_T: 0.01  # Final beta value

AFRK:
# AutoFRK model parameters, for more details, please refer to https://pypi.org/project/autoFRK/
  method: "fast"  # autoFRK method to use (e.g., "fast")
  tps_method: "rectangular"  # autoFRK's TPS method to use (e.g., "rectangular")

training.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
# Training configuration
batch_size: 80  # Batch size
output_directory: "/home/u6025091/SSSD_CP/results/MERRA2-1var-S4+AFRK-final-low-days"  # Output directory for checkpoints and logs
ckpt_iter: "max"  # Checkpoint mode (max or min)
iters_per_ckpt: 500  # Checkpoint frequency (number of epochs)
iters_per_logging: 200  # Log frequency (number of iterations)
n_iters: 4000  # Maximum number of iterations
learning_rate: 0.0005  # Learning rate

# Additional training settings
only_generate_missing: true  # Generate missing values only
use_model: 2  # Model to use for training
masking: "forecast"  # Masking strategy for missing values
missing_k: 216  # Number of missing values

# Data paths
data:
  train_path: "/home/u6025091/SSSD_CP/datasets/MERRA2-1var-final-low-days/data_train_known_real.npy"  # Path to training data

# autoFRK config
enable_spatial_training: true  # Enable spatial training step
location_path: "/home/u6025091/SSSD_CP/datasets/MERRA2-1var-final-low-days/stations_known_locations.npy"  # Path to known locations

訓練花費時間:24h 25m 30.20s
推論階段 AFRK 之 MRTS 基底數:301

MetricALL Locs & All TimeKnown Locs & All TimeUnknown Locs & All TimeALL Locs & FutureKnown Locs & FutureUnknown Locs & FutureALL Locs & PastKnown Locs & PastUnknown Locs & Past
MSPE1.3466371.0776052.42276410.40432310.45967910.1828950.3644790.0602721.581304
RMSPE1.1604471.0380781.5565233.2255733.2341433.1910650.6037210.2455041.257499
MSPE%0.0048310.0038650.0086960.0373480.0375340.0366020.0013050.0002140.005670
RMSPE%0.0695070.0621710.0932500.1932560.1937380.1913160.0361300.0146400.075297
MAPE0.5175930.4061710.9632802.5244762.5281992.5095850.2999790.1760720.795608
MAPE%0.0018520.0014540.0034460.0090740.0090870.0090240.0010690.0006260.002841

結論

目前而言,採實驗 6 之參數設定的 MERRA2-1var-S4+AFRK-low-day 實驗在各項評估指標上表現最佳,尤其在未來時間點的預測表現上,MSPE、RMSPE、MAPE 等指標均有顯著降低,且在過去時間點的預測表現也有不錯的成效。故下次實驗應以此參數設定為基礎,進行更多次的實驗以驗證其穩定性與可靠性。

目前問題為,是否需同步降低 Weather2k 之資料集時間序列,仍需教授不吝指導。

以上部分實驗空缺,因實驗尚未完成,敬請見諒!

https://raw.githubusercontent.com/Josh-test-lab/website-assets-repository/refs/heads/main/posts/1150317%20meeting/not%20yet%20finish.png
實驗進行中。

實驗快照

本次實驗共使用 4 臺國網中心機器進行,以下呈現其中 3 臺的實驗快照。

https://raw.githubusercontent.com/Josh-test-lab/website-assets-repository/refs/heads/main/posts/1150317%20meeting/Experimental%20snapshot.png
實驗快照。

論文暫寫

參考資料

  • Zhu X, Xiong Y, Wu M, et al. Weather2K: A Multivariate Spatio-Temporal Benchmark Dataset for Meteorological Forecasting Based on Real-Time Observation Data from Ground Weather Stations[C]//International Conference on Artificial Intelligence and Statistics. PMLR, 2023: 2704-2722.
  • Juan Lopez Alcaraz 、 Nils Strodthoff(2022)。Diffusion-based time series imputation and forecasting with structured state space models。Transactions on Machine Learning Research。參考自 https://openreview.net/forum?id=hHiIbk7ApW
  • SSSD(2022)。GitHub。參考自 https://github.com/AI4HealthUOL/SSSD