在閱讀並測試 SSSD_CP 儲存庫時,由於該 SSSD 分支需要使用 linux 、 conda 及 docker,便先於安裝。因為操作系統為 windows ,故選擇安裝 windows subsystem linux (wsl) 作為操作環境。在測試過程中發現,因 windows 的換行符號為 \r\n , 通稱 CRLF ;但 linux 、 unix 和 mac 等系統用的是 \n 作為換行符號, 通稱 LF,故在執行中時,如果經過 windows 系統編輯過的檔案就無法順利於 wsl 中執行(因為多了 \r 導致檔案路徑報錯)。
因此,我們便需要使用 dos2unix 這個 linux 套件將文件、檔案中的換行符號轉換為 linux 可接受的字元,才能運行 .sh 腳本。
wavenet:# WaveNet model parametersinput_channels:14# Number of input channelsoutput_channels:14# Number of output channelsresidual_layers:36# Number of residual layersresidual_channels:256# Number of channels in residual blocksskip_channels:256# Number of channels in skip connections# Diffusion step embedding dimensionsdiffusion_step_embed_dim_input:128# Input dimensiondiffusion_step_embed_dim_hidden:512# Middle dimensiondiffusion_step_embed_dim_output:512# Output dimension# Structured State Spaces sequence model (S4) configurationss4_max_sequence_length:100# Maximum sequence lengths4_state_dim:64# State dimensions4_dropout:0.0# Dropout rates4_bidirectional:true# Whether to use bidirectional layerss4_use_layer_norm:true# Whether to use layer normalizationdiffusion:# Diffusion model parametersT:200# Number of diffusion stepsbeta_0:0.0001# Initial beta valuebeta_T:0.02# Final beta value
其中,我只更改 input_channels 和 output_channels 使其輸入與輸出符合資料集的 features 數量。以及更改了 s4_max_sequence_length 以符合時間點。
training.yaml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
# Training configurationbatch_size:80# Batch sizeoutput_directory:"/mnt/d/Code/sssd_cp_learning_and_testing/learning_and_testing/results/checkpoint"# Output directory for checkpoints and logsckpt_iter:"max"# Checkpoint mode (max or min)iters_per_ckpt:1000# Checkpoint frequency (number of epochs)iters_per_logging:1000# Log frequency (number of iterations)n_iters:60000# Maximum number of iterationslearning_rate:0.0002# Learning rate# Additional training settingsonly_generate_missing:true# Generate missing values onlyuse_model:2# Model to use for trainingmasking:"forecast"# Masking strategy for missing valuesmissing_k:24# Number of missing values# Data pathsdata:train_path:"/mnt/d/Code/sssd_cp_learning_and_testing/learning_and_testing/SSSD/datasets/Mujoco/train_mujoco.npy"# Path to training data
以下是成功執行訓練腳本的提示詞。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
user@LAPTOP-KOPTLCHM:/mnt/d/Code/sssd_cp_learning_and_testing/learning_and_testing/SSSD_CP$ conda activate sssd
(sssd) user@LAPTOP-KOPTLCHM:/mnt/d/Code/sssd_cp_learning_and_testing/learning_and_testing/SSSD_CP$ ./scripts/diffusion/training_job.sh -m configs/model.yaml -t configs/training.yaml
Script is running from: /mnt/d/Code/sssd_cp_learning_and_testing/learning_and_testing/SSSD_CP/scripts/diffusion
Intializing conda
Activating Conda Env: sssd
[Execution - Training]/mnt/d/Code/sssd_cp_learning_and_testing/learning_and_testing/SSSD_CP/scripts/diffusion/train.py --model_config configs/model.yaml --training_config configs/training.yaml
2025-03-16 16:51:31,034 - sssd.utils.logger - INFO - Model spec: {'wavenet': {'input_channels': 14, 'output_channels': 14, 'residual_layers': 36, 'residual_channels': 256, 'skip_channels': 256, 'diffusion_step_embed_dim_input': 128, 'diffusion_step_embed_dim_hidden': 512, 'diffusion_step_embed_dim_output': 512, 's4_max_sequence_length': 100, 's4_state_dim': 64, 's4_dropout': 0.0, 's4_bidirectional': True, 's4_use_layer_norm': True}, 'diffusion': {'T': 200, 'beta_0': 0.0001, 'beta_T': 0.02}}2025-03-16 16:51:31,034 - sssd.utils.logger - INFO - Training spec: {'batch_size': 80, 'output_directory': '/mnt/d/Code/sssd_cp_learning_and_testing/learning_and_testing/results/checkpoint', 'ckpt_iter': 'max', 'iters_per_ckpt': 1000, 'iters_per_logging': 1000, 'n_iters': 60000, 'learning_rate': 0.0002, 'only_generate_missing': True, 'use_model': 2, 'masking': 'forecast', 'missing_k': 24, 'data': {'train_path': '/mnt/d/Code/sssd_cp_learning_and_testing/learning_and_testing/SSSD/datasets/Mujoco/train_mujoco.npy'}}2025-03-16 16:51:31,190 - sssd.utils.logger - INFO - Using 1 GPUs!
2025-03-16 16:51:31,287 - sssd.utils.logger - INFO - Output directory /mnt/d/Code/sssd_cp_learning_and_testing/learning_and_testing/results/checkpoint/T200_beta00.0001_betaT0.02
2025-03-16 16:51:42,974 - sssd.utils.logger - INFO - Current time: 2025-03-16 16:51:42
2025-03-16 16:51:44,226 - sssd.utils.logger - INFO - No valid checkpoint model found, start training from initialization.
2025-03-16 16:51:44,227 - sssd.utils.logger - INFO - Start the 1 iteration
3%|███▌ | 3/100 [01:46<56:43, 35.09s/it
(base) user@LAPTOP-KOPTLCHM:/mnt/d/Code/sssd_cp_learning_and_testing/learning_and_testing/SSSD_CP$ conda deactivate
user@LAPTOP-KOPTLCHM:/mnt/d/Code/sssd_cp_learning_and_testing/learning_and_testing/SSSD_CP$ ./scripts/diffusion/training_job.sh -m configs/model.yaml -t configs/training.yaml
Script is running from: /mnt/d/Code/sssd_cp_learning_and_testing/learning_and_testing/SSSD_CP/scripts/diffusion
/mnt/d/Code/sssd_cp_learning_and_testing/learning_and_testing/SSSD_CP/scripts/diffusion/../../envs/conda/utils.sh: line 12: conda: command not found
No Conda environment found matching
defget_mask_forecast(sample:torch.Tensor,k:int)->torch.Tensor:"""
Get mask of same segments (black-out missing) across channels based on k.
Args:
sample (torch.Tensor): Tensor of shape [# of samples, # of channels].
k (int): Number of missing values.
Returns:
torch.Tensor: Mask of sample's shape where 0's indicate missing values to be imputed, and 1's indicate preserved values.
"""mask=torch.ones_like(sample)# Initialize mask with all ones# Calculate the indices of missing valuess_nan=torch.arange(mask.shape[0]-k,mask.shape[0])# Apply mask for each channelforchannelinrange(mask.shape[1]):mask[s_nan,channel]=0returnmask
Juan Lopez Alcaraz 、 Nils Strodthoff(2022)。Diffusion-based time series imputation and forecasting with structured state space models。Transactions on Machine Learning Research。參考自 https://openreview.net/forum?id=hHiIbk7ApW