Skip to main content

使用 PyTorch 预测时间序列 - 数据加载器、规范化器、指标和模型

项目描述

PyTorch 预测

PyPI 版本 康达版 文档状态 短绒状态 构建状态 代码覆盖率

文档| 教程| 发行说明

PyTorch Forecasting是一个基于 PyTorch 的包,用于预测具有最先进网络架构的时间序列。它提供了一个高级 API,用于在 pandas 数据帧上训练网络,并利用 PyTorch Lightning在(多个)GPU、CPU 上进行可扩展的训练和自动日志记录。


我们关于Towards Data Science的文章介绍了该软件包并提供了背景信息。

PyTorch Forecasting 旨在通过神经网络简化最先进的时间序列预测,以用于现实世界的案例和研究等。目标是为专业人士提供具有最大灵活性并为初学者提供合理默认值的高级 API。具体来说,该软件包提供

  • 一个时间序列数据集类,它抽象处理变量转换、缺失值、随机子采样、多个历史长度等。
  • 一个基本模型类,它提供时间序列模型的基本训练以及登录张量板和通用可视化,例如实际与预测和依赖图
  • 用于时间序列预测的多个神经网络架构,已针对实际部署进行了增强,并具有内置的解释功能
  • 多水平时间序列指标
  • Ranger 优化器用于更快的模型训练
  • 使用optuna进行超参数调整

该软件包基于pytorch-lightning构建,允许在 CPU、单个和多个 GPU 上进行开箱即用的训练。

安装

如果你在 Windows 上工作,你需要先安装 PyTorch

pip install torch -f https://download.pytorch.org/whl/torch_stable.html.

否则,您可以继续

pip install pytorch-forecasting

或者,您可以通过 conda 安装软件包

conda install pytorch-forecasting pytorch -c pytorch>=1.7 -c conda-forge

PyTorch Forecasting 现在是从 conda-forge 频道安装的,而 PyTorch 是从 pytorch 频道安装的。

要使用 MQF2 损失(多​​变量分位数损失),还需要安装 pip install pytorch-forecasting[mqf2]

文档

访问https://pytorch-forecasting.readthedocs.io以阅读包含详细教程的文档。

可用型号

该文档提供了可用模型的比较

要实现新模型或其他自定义组件,请参阅如何实现新模型教程。它涵盖了基础架构和高级架构。

使用示例

可以使用PyTorch Lighning Trainer在首先转换为TimeSeriesDataSet的pandas 数据帧上训练网络。

# imports for training
import pytorch_lightning as pl
from pytorch_lightning.loggers import TensorBoardLogger
from pytorch_lightning.callbacks import EarlyStopping, LearningRateMonitor
# import dataset, network to train and metric to optimize
from pytorch_forecasting import TimeSeriesDataSet, TemporalFusionTransformer, QuantileLoss

# load data: this is pandas dataframe with at least a column for
# * the target (what you want to predict)
# * the timeseries ID (which should be a unique string to identify each timeseries)
# * the time of the observation (which should be a monotonically increasing integer)
data = ...

# define the dataset, i.e. add metadata to pandas dataframe for the model to understand it
max_encoder_length = 36
max_prediction_length = 6
training_cutoff = "YYYY-MM-DD"  # day for cutoff

training = TimeSeriesDataSet(
    data[lambda x: x.date <= training_cutoff],
    time_idx= ...,  # column name of time of observation
    target= ...,  # column name of target to predict
    group_ids=[ ... ],  # column name(s) for timeseries IDs
    max_encoder_length=max_encoder_length,  # how much history to use
    max_prediction_length=max_prediction_length,  # how far to predict into future
    # covariates static for a timeseries ID
    static_categoricals=[ ... ],
    static_reals=[ ... ],
    # covariates known and unknown in the future to inform prediction
    time_varying_known_categoricals=[ ... ],
    time_varying_known_reals=[ ... ],
    time_varying_unknown_categoricals=[ ... ],
    time_varying_unknown_reals=[ ... ],
)

# create validation dataset using the same normalization techniques as for the training dataset
validation = TimeSeriesDataSet.from_dataset(training, data, min_prediction_idx=training.index.time.max() + 1, stop_randomization=True)

# convert datasets to dataloaders for training
batch_size = 128
train_dataloader = training.to_dataloader(train=True, batch_size=batch_size, num_workers=2)
val_dataloader = validation.to_dataloader(train=False, batch_size=batch_size, num_workers=2)

# create PyTorch Lighning Trainer with early stopping
early_stop_callback = EarlyStopping(monitor="val_loss", min_delta=1e-4, patience=1, verbose=False, mode="min")
lr_logger = LearningRateMonitor()
trainer = pl.Trainer(
    max_epochs=100,
    gpus=0,  # run on CPU, if on multiple GPUs, use accelerator="ddp"
    gradient_clip_val=0.1,
    limit_train_batches=30,  # 30 batches per epoch
    callbacks=[lr_logger, early_stop_callback],
    logger=TensorBoardLogger("lightning_logs")
)

# define network to train - the architecture is mostly inferred from the dataset, so that only a few hyperparameters have to be set by the user
tft = TemporalFusionTransformer.from_dataset(
    # dataset
    training,
    # architecture hyperparameters
    hidden_size=32,
    attention_head_size=1,
    dropout=0.1,
    hidden_continuous_size=16,
    # loss metric to optimize
    loss=QuantileLoss(),
    # logging frequency
    log_interval=2,
    # optimizer parameters
    learning_rate=0.03,
    reduce_on_plateau_patience=4
)
print(f"Number of parameters in network: {tft.size()/1e3:.1f}k")

# find the optimal learning rate
res = trainer.lr_find(
    tft, train_dataloaders=train_dataloader, val_dataloaders=val_dataloader, early_stop_threshold=1000.0, max_lr=0.3,
)
# and plot the result - always visually confirm that the suggested learning rate makes sense
print(f"suggested learning rate: {res.suggestion()}")
fig = res.plot(show=True, suggest=True)
fig.show()

# fit the model on the data - redefine the model with the correct learning rate if necessary
trainer.fit(
    tft, train_dataloaders=train_dataloader, val_dataloaders=val_dataloader,
)

下载文件

下载适用于您平台的文件。如果您不确定要选择哪个,请了解有关安装包的更多信息。

源分布

pytorch_forecasting-0.10.3.tar.gz (126.3 kB 查看哈希

已上传 source

内置分布

pytorch_forecasting-0.10.3-py3-none-any.whl (141.4 kB 查看哈希

已上传