目錄

1140506 meeting

前言

本次實驗重置位於國網中心高速檔案系統(HFS) /home/ 目錄中的所有檔案,重新上傳所有檔案並再次進行實驗。本次實驗所用的訓練集與上週相同,惟設定檔案修改如下:

設定檔

training.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
# Training configuration
batch_size: 1  # Batch size
output_directory: "./results/real_time"  # Output directory for checkpoints and logs
ckpt_iter: "max"  # Checkpoint mode (max or min)
iters_per_ckpt: 1000  # Checkpoint frequency (number of epochs)
iters_per_logging: 100  # Log frequency (number of iterations)
n_iters: 20000  # Maximum number of iterations
learning_rate: 0.001  # Learning rate

# Additional training settings
only_generate_missing: true  # Generate missing values only
use_model: 2  # Model to use for training
masking: "rm"  # Masking strategy for missing values
missing_k: 200  # Number of missing values

# Data paths
data:
  train_path: "./datasets/real_time/pollutants_train.npy"  # Path to training data

model.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
wavenet:
  # WaveNet model parameters
  input_channels: 26  # Number of input channels
  output_channels: 26  # Number of output channels
  residual_layers: 36  # Number of residual layers
  residual_channels: 256  # Number of channels in residual blocks
  skip_channels: 256  # Number of channels in skip connections

  # Diffusion step embedding dimensions
  diffusion_step_embed_dim_input: 128  # Input dimension
  diffusion_step_embed_dim_hidden: 512  # Middle dimension
  diffusion_step_embed_dim_output: 512  # Output dimension

  # Structured State Spaces sequence model (S4) configurations
  s4_max_sequence_length: 2000  # Maximum sequence length
  s4_state_dim: 64  # State dimension
  s4_dropout: 0.0  # Dropout rate
  s4_bidirectional: true  # Whether to use bidirectional layers
  s4_use_layer_norm: true  # Whether to use layer normalization

diffusion:
  # Diffusion model parameters
  T: 200  # Number of diffusion steps
  beta_0: 0.0001  # Initial beta value
  beta_T: 0.02  # Final beta value

inference.yaml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# Inference configuration
batch_size: 1  # Batch size for inference
output_directory: "./results/real_time/12/inference/mnr"  # Output directory for inference results
ckpt_path: "./results/real_time"  # Path to checkpoint for inference
trials: 1 # Replications

# Additional training settings
only_generate_missing: true  # Generate missing values only
use_model: 2  # Model to use for training
masking: "mnr"  # Masking strategy for missing values
missing_k: 12  # Number of missing values

# Data paths
data:
  test_path: "./datasets/real_time/pollutants_test.npy"  # Path to test data

調整

經過排查後,本次實驗將 batch_size 參數進行調整,以期能有較好之結果。 batch_size 影響之程式碼如下:

/scripts/diffusion/infer.pyrun_job() 函數可知

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
def run_job(
    model_config: dict,
    inference_config: dict,
    device: Optional[Union[torch.device, str]],
    ckpt_iter: Union[str, int],
) -> None:

    ...
    dataloader = get_dataloader(
        inference_config["data"]["test_path"],
        batch_size,
        device=device,
    )
    ...

再由 /sssd/data/utils.py 可得到

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
def get_dataloader(
    path: str,
    batch_size: int,
    is_shuffle: bool = True,
    device: Union[str, torch.device] = "cpu",
    num_workers: int = 0,
) -> DataLoader:
    """
    Get a PyTorch DataLoader for the dataset stored at the given path.

    Args:
        path (str): Path to the dataset file.
        batch_size (int): Size of each batch.
        is_shuffle (bool, optional): Whether to shuffle the dataset. Defaults to True.
        device (Union[str, torch.device], optional): Device to move the data to. Defaults to "cpu".
        num_workers (int, optional): Number of subprocesses to use for data loading. Defaults to 8.

    Returns:
        DataLoader: PyTorch DataLoader for the dataset.
    """
    dataset = TensorDataset(torch.from_numpy(np.load(path)).to(dtype=torch.float32))
    pin_memory = device == "cuda" or device == torch.device("cuda")
    return DataLoader(
        dataset,
        batch_size=batch_size,
        shuffle=is_shuffle,
        pin_memory=pin_memory,
        num_workers=num_workers,
    )

因此,我們可以得到 dataloader 所回傳的值為輸入資料三個維度中第一維度的 batch_size 大小的所有資料,例如目前資料維度為 (5, 2000, 26) ,則將 batch_size 設為 1 ,每一個 epoch 大小就為 (1, 2000, 26) ,單一 iteration 就會有 $5 \div 1 = 5$ 個 epochs。我們也就可以知道在 /sssd/inference/generator.pyDiffusionGenerator 這個 Class 中的迴圈的用途。

1
2
3
4
def generate(self) -> list:
    ...
    for index, (batch,) in enumerate(self.dataloader):
    ...

因此, batch_size 應設為小於資料集中第一維度大小的值,且應該要能整除為佳。

以上述設定檔所填補的結果如下。

填補結果

以下所有方式皆測試了缺失值為 200 、 24 與 12 的情況。測試的程式碼如下,僅需調整 missing_k

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
# test imputation
import os
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm

test_data = np.load(r'real_time\pollutants_test.npy').transpose(0, 2, 1)

## functions
def imputation_plot(full_data, missing_data, imputation_data, title, save_path):
    """
    Plot the first 3 dimensions of the first sample, comparing full data and missing data.
    """
    os.makedirs(save_path, exist_ok=True)  # 確保保存目錄存在
    show_dims = 2
    fig, axes = plt.subplots(show_dims, 1, figsize=(12, 8), sharex=True)
    #imputation_data = np.where(np.isnan(missing_data), imputation_data, np.nan)
    for j in range(show_dims):
        axes[j].plot(full_data[0, j], color='gray', label='full data', alpha=0.6)
        axes[j].plot(imputation_data[0, j], color='orange', label='imputation data', alpha=0.6)
        axes[j].plot(missing_data[0, j], color='red', label=title)
        axes[j].set_ylabel(f'Dim {j}')
        axes[j].legend()

    plt.suptitle(title)
    plt.xlabel('Time')
    plt.tight_layout()
    plt.savefig(f"{save_path}.png", dpi=300)
    #plt.show()

def imputation_plot_each_dim(full_data, missing_data, imputation_data, title, save_dir):
    """
    Plot each dimension of the first sample separately, comparing full data, missing data, and imputation data.
    Save each plot as a separate file.
    """
    num_dims = full_data.shape[1]
    os.makedirs(save_dir, exist_ok=True)  # 確保保存目錄存在

    for j in tqdm(range(num_dims)):
        fig, ax = plt.subplots(figsize=(12, 4))
        ax.plot(full_data[0, j], color='gray', label='full data', alpha=0.6)
        ax.plot(imputation_data[0, j], color='orange', label='imputation data', alpha=0.6)
        ax.plot(missing_data[0, j], color='red', label=title)
        ax.set_ylabel(f'Dim {j}')
        ax.set_xlabel('Time')
        ax.set_title(f'{title} - Dimension {j}')

        # 圖例放到圖外
        ax.legend(
            loc='center left',           # 圖例在圖的左側中央(搭配下面這行)
            bbox_to_anchor=(1, 0.5)       # (x, y):x=1是圖的最右邊,往右推一點
        )

        fig.tight_layout(rect=[0, 0, 0.85, 1])  # 調整畫布大小,右邊留空給圖例
        save_path = os.path.join(save_dir, f"{title}_dim{j}.png")
        plt.savefig(save_path, dpi=300)
        plt.close(fig)  # 不然畫一堆圖會記憶體爆掉

def analysis_predict_data(missing_k, path):
    folder_path = f'.\\real_time\\imputation\\{missing_k}\\predict\\{path}\\T200_beta00.0001_betaT0.02\\max'
    os.makedirs(folder_path, exist_ok=True)
    length = len(os.listdir(folder_path))
    imputation = []
    for i in range(length):
        file_path = os.path.join(folder_path, f'imputation{i}.npy')
        arr = np.load(file_path)
        imputation.append(arr)
    predict_data = np.concatenate(imputation, axis=0)
    print(f'Shape: {predict_data.shape}')
    print(f'NAs: {np.isnan(predict_data).sum()}')
    # print(predict_data[0, 0])
    print(f'MSPE for all: {((test_data - predict_data)**2).mean()}')

    test_data_predict = np.load(f'real_time\\{missing_k}\\pollutants_test_{path}.npy').transpose(0, 2, 1)
    print(f'MSPE only for missing: {((test_data[np.isnan(test_data_predict)] - predict_data[np.isnan(test_data_predict)])**2).mean()}')

    imputation_plot(test_data, test_data_predict, predict_data, f'imputation {path} test data', f'real_time\\imputation\\{missing_k}\\result\\predict\\imputation0_{path}')
    imputation_plot_each_dim(test_data, test_data_predict, predict_data, f'imputation {path} test data', f'real_time\\imputation\\{missing_k}\\result\\predict\\imputation0_{path}')
    # test_data[0, 0][1:10]
    # test_data_predict[0, 0][1:10]
    # predict_data[0, 0][1:10]

missing_k = 200

## rm
analysis_predict_data(missing_k, 'rm')

## rbm
analysis_predict_data(missing_k, 'rbm')

## bm
analysis_predict_data(missing_k, 'bm')

## tf
analysis_predict_data(missing_k, 'tf')



# original
def analysis_imputation_data(missing_k, path):
    folder_path = f'.\\real_time\\imputation\\{missing_k}\\inference\\{path}\\T200_beta00.0001_betaT0.02\\max'
    os.makedirs(folder_path, exist_ok=True)
    length = len(os.listdir(folder_path))
    imputation = []
    for i in range(length):
        file_path = os.path.join(folder_path, f'imputation{i}.npy')
        arr = np.load(file_path)
        imputation.append(arr)
    imputation_data = np.concatenate(imputation, axis=0)
    print(f'Shape: {imputation_data.shape}')
    print(f'NAs: {np.isnan(imputation_data).sum()}')
    # print(imputation_data[0, 0])
    print(f'MSPE: {((test_data - imputation_data)**2).mean()}')

    imputation_plot(test_data, test_data, imputation_data, f'imputation {path} test data', f'real_time\\imputation\\{missing_k}\\result\\original\\imputation0_{path}')
    imputation_plot_each_dim(test_data, test_data, imputation_data, f'imputation {path} test data', f'real_time\\imputation\\{missing_k}\\result\\original\\imputation0_{path}')
    # test_data[0, 0][1:10]
    # test_data_imputation[0, 0][1:10]
    # imputation_data[0, 0][1:10]

## rm
analysis_imputation_data(missing_k, 'rm')

## bm
analysis_imputation_data(missing_k, 'bm')

## mnr
analysis_imputation_data(missing_k, 'mnr')

## tf
analysis_imputation_data(missing_k, 'tf')

原程式 (test 無缺值)

200

rm

對於全測試集的 MSPE: 0.0007726947053672325

gallery_made_with_nanogallery2-original-200-rm
bm

對於全測試集的 MSPE: 0.24869654130201302

gallery_made_with_nanogallery2-original-200-bm
mnr

對於全測試集的 MSPE: 0.2518200360151487

gallery_made_with_nanogallery2-original-200-mnr
forecast

對於全測試集的 MSPE: 0.2584019887489863

gallery_made_with_nanogallery2-original-200-tf

24

rm

對於全測試集的 MSPE: 0.0007776705576000954

gallery_made_with_nanogallery2-original-24-rm
bm

對於全測試集的 MSPE: 0.24558736147078322

gallery_made_with_nanogallery2-original-24-bm
mnr

對於全測試集的 MSPE: 0.2779062416883658

gallery_made_with_nanogallery2-original-24-mnr
forecast

對於全測試集的 MSPE: 0.2426996368188828

gallery_made_with_nanogallery2-original-24-tf

12

rm

對於全測試集的 MSPE: 0.000343678513735235

gallery_made_with_nanogallery2-original-12-rm
bm

對於全測試集的 MSPE: 0.12490228987565512

gallery_made_with_nanogallery2-original-12-bm
mnr

對於全測試集的 MSPE: 0.12561325934437964

gallery_made_with_nanogallery2-original-12-mnr
forecast

對於全測試集的 MSPE: 0.1331316997123435

gallery_made_with_nanogallery2-original-12-tf

預測 (test 可缺值)

200

rm

對於全測試集的 MSPE: 0.4033083841718653

僅對於測試集中缺值的 MSPE: 0.5150290538329023

gallery_made_with_nanogallery2-predict-200-rm
bm

對於全測試集的 MSPE: 1.9847157860554776

僅對於測試集中缺值的 MSPE: 4.961677683754047

gallery_made_with_nanogallery2-predict-200-bm
rbm

對於全測試集的 MSPE: 0.7959954168054797

僅對於測試集中缺值的 MSPE: 1.2573786423253352

gallery_made_with_nanogallery2-predict-200-rbm
forecast

對於全測試集的 MSPE: 2.0579853656490044

僅對於測試集中缺值的 MSPE: 5.144843697957479

gallery_made_with_nanogallery2-predict-200-tf

24

rm

對於全測試集的 MSPE: 0.38739405016442136

僅對於測試集中缺值的 MSPE: 0.37690720477779943

gallery_made_with_nanogallery2-predict-24-rm
bm

對於全測試集的 MSPE: 1.9683662364048184

僅對於測試集中缺值的 MSPE: 4.113597133774161e-05

gallery_made_with_nanogallery2-predict-24-bm
rbm

對於全測試集的 MSPE: 0.8399087830021398

僅對於測試集中缺值的 MSPE: 0.8648797892201264

gallery_made_with_nanogallery2-predict-24-rbm
forecast

對於全測試集的 MSPE: 2.014067660524411

僅對於測試集中缺值的 MSPE: 4.566016572936741

gallery_made_with_nanogallery2-predict-24-tf

12

rm

對於全測試集的 MSPE: 0.00040013436784975165

僅對於測試集中缺值的 MSPE: 0.0002673385994639709

gallery_made_with_nanogallery2-predict-12-rm
bm

對於全測試集的 MSPE: 0.1269181803083845

僅對於測試集中缺值的 MSPE: 5.284446942912535

gallery_made_with_nanogallery2-predict-12-bm
rbm

對於全測試集的 MSPE: 0.00045168305235805304

僅對於測試集中缺值的 MSPE: 0.0025162975939579785

gallery_made_with_nanogallery2-predict-12-rbm
forecast

對於全測試集的 MSPE: 0.12043755824949312

僅對於測試集中缺值的 MSPE: 5.0144033937895385

gallery_made_with_nanogallery2-predict-12-tf

發現

順帶一提,此專案無法使用多顆 GPU 進行訓練,未來可以考慮修改並支援。

執行結果參考
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
Every 0.1s: nvidia-smi                   j5shm3test1140504-tv9mh: Sun May  4 14:26:39 2025

Sun May  4 14:26:39 2025
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.08             Driver Version: 535.161.08   CUDA Version: 12.8     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla V100-SXM2-32GB           On  | 00000000:3D:00.0 Off |                    0 |
| N/A   45C    P0             190W / 300W |  18629MiB / 32768MiB |    100%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   1  Tesla V100-SXM2-32GB           On  | 00000000:3E:00.0 Off |                    0 |
| N/A   33C    P0              41W / 300W |      3MiB / 32768MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
+---------------------------------------------------------------------------------------+

結論

原程式 (test 無缺值)

NAsMethod2002412
全測試集 MSPErm0.000770.000780.00034
bm0.248700.245590.12490
mnr0.251820.277910.12561
forecast0.258400.242700.13313

預測 (test 可缺值)

NAsMethod2002412
全測試集 MSPErm0.403310.387390.00040
bm1.984721.968370.12692
rbm0.796000.839910.00045
forecast2.057992.014070.12044
缺值部分 MSPErm0.515030.376910.00027
bm4.961680.000045.28445
rbm1.257380.864880.00252
forecast5.144844.566025.01440

目前發現,以 masking: "rm" 進行訓練的結果,對於 rm 缺失的情況填補得較好,但其他缺失較差。不排除是因為 checkpoints 僅 2,000 ,且訓練設為 masking: "rm" ,模型不足以抓到其他情況的特徵。接下來將進一步測試 tf 填補方式。

運行環境

  • 本機作業系統:Windows 11 24H2
    • 程式語言:Python 3.12.9
  • 計算平臺:財團法人國家實驗研究院國家高速網路與計算中心臺灣 AI 雲
    • 作業系統:Ubuntu
    • Miniconda
    • GPU:NVIDIA Tesla V100 32GB GPU
    • CUDA 12.8 driver
    • 程式語言:Python 3.10.16 for Linux

延伸學習

參考資料