用 XGBoost 进行时间序列预测
Python中文社区
共 14504字,需浏览 30分钟
· 2021-04-07
sudo pip install xgboost
一旦安装,您可以通过运行以下代码来确认它已成功安装,并且您正在使用现代版本:
# xgboost
import xgboost
print("xgboost", xgboost.__version__)
xgboost 1.0.1
# define model
model = XGBRegressor()
time, measure
1, 100
2, 110
3, 108
4, 115
5, 120
X, y
?, 100
100, 110
110, 108
108, 115
115, 120
120, ?
shift()
函数自动创建时间序列问题的新框架。# transform a time series dataset into a supervised learning dataset
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
n_vars = 1 if type(data) is list else data.shape[1]
df = DataFrame(data)
cols = list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
# put it all together
agg = concat(cols, axis=1)
# drop rows with NaN values
if dropnan:
agg.dropna(inplace=True)
return agg.values
xgboost_forecast()
函数进行单步预测。计算错误度量,并将详细信息返回以进行分析。# walk-forward validation for univariate data
def walk_forward_validation(data, n_test):
predictions = list()
# split dataset
train, test = train_test_split(data, n_test)
# seed history with training dataset
history = [x for x in train]
# step over each time-step in the test set
for i in range(len(test)):
# split test row into input and output columns
testX, testy = test[i, :-1], test[i, -1]
# fit model on history and make a prediction
yhat = xgboost_forecast(history, testX)
# store forecast in list of predictions
predictions.append(yhat)
# add actual observation to history for the next loop
history.append(test[i])
# summarize progress
print('>expected=%.1f, predicted=%.1f' % (testy, yhat))
# estimate prediction error
error = mean_absolute_error(test[:, -1], predictions)
return error, test[:, 1], predictions
train_test_split()
函数可将数据集拆分为训练集和测试集。我们可以在下面定义此功能。# split a univariate dataset into train/test sets
def train_test_split(data, n_test):
return data[:-n_test, :], data[-n_test:, :]
xgboost_forecast()
函数通过将训练数据集和测试输入行作为输入,拟合模型并进行单步预测来实现此目的。# fit an xgboost model and make a one step prediction
def xgboost_forecast(train, testX):
# transform list into array
train = asarray(train)
# split into input and output columns
trainX, trainy = train[:, :-1], train[:, -1]
# fit model
model = XGBRegressor(objective='reg:squarederror', n_estimators=1000)
model.fit(trainX, trainy)
# make a one-step prediction
yhat = model.predict([testX])
return yhat[0]
https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-total-female-births.csv
https://raw.githubusercontent.com/jbrownlee/Datasets/master/daily-total-female-births.names
"Date","Births"
"1959-01-01",35
"1959-01-02",32
"1959-01-03",30
"1959-01-04",31
"1959-01-05",44
...
# load and plot the time series dataset
from pandas import read_csv
from matplotlib import pyplot
# load dataset
series = read_csv('daily-total-female-births.csv', header=0, index_col=0)
values = series.values
# plot dataset
pyplot.plot(values)
pyplot.show()
# forecast monthly births with xgboost
from numpy import asarray
from pandas import read_csv
from pandas import DataFrame
from pandas import concat
from sklearn.metrics import mean_absolute_error
from xgboost import XGBRegressor
from matplotlib import pyplot
# transform a time series dataset into a supervised learning dataset
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
n_vars = 1 if type(data) is list else data.shape[1]
df = DataFrame(data)
cols = list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
# put it all together
agg = concat(cols, axis=1)
# drop rows with NaN values
if dropnan:
agg.dropna(inplace=True)
return agg.values
# split a univariate dataset into train/test sets
def train_test_split(data, n_test):
return data[:-n_test, :], data[-n_test:, :]
# fit an xgboost model and make a one step prediction
def xgboost_forecast(train, testX):
# transform list into array
train = asarray(train)
# split into input and output columns
trainX, trainy = train[:, :-1], train[:, -1]
# fit model
model = XGBRegressor(objective='reg:squarederror', n_estimators=1000)
model.fit(trainX, trainy)
# make a one-step prediction
yhat = model.predict(asarray([testX]))
return yhat[0]
# walk-forward validation for univariate data
def walk_forward_validation(data, n_test):
predictions = list()
# split dataset
train, test = train_test_split(data, n_test)
# seed history with training dataset
history = [x for x in train]
# step over each time-step in the test set
for i in range(len(test)):
# split test row into input and output columns
testX, testy = test[i, :-1], test[i, -1]
# fit model on history and make a prediction
yhat = xgboost_forecast(history, testX)
# store forecast in list of predictions
predictions.append(yhat)
# add actual observation to history for the next loop
history.append(test[i])
# summarize progress
print('>expected=%.1f, predicted=%.1f' % (testy, yhat))
# estimate prediction error
error = mean_absolute_error(test[:, -1], predictions)
return error, test[:, -1], predictions
# load the dataset
series = read_csv('daily-total-female-births.csv', header=0, index_col=0)
values = series.values
# transform the time series data into supervised learning
data = series_to_supervised(values, n_in=6)
# evaluate
mae, y, yhat = walk_forward_validation(data, 12)
print('MAE: %.3f' % mae)
# plot expected vs preducted
pyplot.plot(y, label='Expected')
pyplot.plot(yhat, label='Predicted')
pyplot.legend()
pyplot.show()
>expected=42.0, predicted=44.5
>expected=53.0, predicted=42.5
>expected=39.0, predicted=40.3
>expected=40.0, predicted=32.5
>expected=38.0, predicted=41.1
>expected=44.0, predicted=45.3
>expected=34.0, predicted=40.2
>expected=37.0, predicted=35.0
>expected=52.0, predicted=32.5
>expected=48.0, predicted=41.4
>expected=55.0, predicted=46.6
>expected=50.0, predicted=47.2
MAE: 5.957
# finalize model and make a prediction for monthly births with xgboost
from numpy import asarray
from pandas import read_csv
from pandas import DataFrame
from pandas import concat
from xgboost import XGBRegressor
# transform a time series dataset into a supervised learning dataset
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
n_vars = 1 if type(data) is list else data.shape[1]
df = DataFrame(data)
cols = list()
# input sequence (t-n, ... t-1)
for i in range(n_in, 0, -1):
cols.append(df.shift(i))
# forecast sequence (t, t+1, ... t+n)
for i in range(0, n_out):
cols.append(df.shift(-i))
# put it all together
agg = concat(cols, axis=1)
# drop rows with NaN values
if dropnan:
agg.dropna(inplace=True)
return agg.values
# load the dataset
series = read_csv('daily-total-female-births.csv', header=0, index_col=0)
values = series.values
# transform the time series data into supervised learning
train = series_to_supervised(values, n_in=6)
# split into input and output columns
trainX, trainy = train[:, :-1], train[:, -1]
# fit model
model = XGBRegressor(objective='reg:squarederror', n_estimators=1000)
model.fit(trainX, trainy)
# construct an input for a new preduction
row = values[-6:].flatten()
# make a one-step prediction
yhat = model.predict(asarray([row]))
print('Input: %s, Predicted: %.3f' % (row, yhat[0]))
Input: [34 37 52 48 55 50], Predicted: 42.708
作者:沂水寒城,CSDN博客专家,个人研究方向:机器学习、深度学习、NLP、CV
Blog: http://yishuihancheng.blog.csdn.net
赞 赏 作 者
更多阅读
特别推荐
点击下方阅读原文加入社区会员
评论
用 Shader 实现旗帜飘扬动画效果
我觉得对于刚入门 3D 编程的朋友来说,如果能够完成代码创建模型数据->创建材质->编写Shader动画这一系列,想必会有满满的成就感。今天就用 Cocos Creator 的 utils.MeshUtils.createMesh 接口,带大家感受一下这个流程。这个流程不仅可以用于新手学
COCOS
2
GPT的风也吹到了CV,详解自回归视觉模型的先驱! ImageGPT:使用图像序列训练图像 GPT模型
作者丨科技猛兽编辑丨极市平台导读 在 CIFAR-10 上,iGPT 使用 linear probing 实现了 96.3% 的精度,优于有监督的 Wide ResNet,并通过完全微调实现了 99.0% 的精度,匹配顶级监督预训练模型。本文目录1 自回归视觉模型的先驱 ImageGPT:
机器学习初学者
0
我用这10招,能减少了80%的BUG
将Python客栈设为“星标⭐”第一时间收到最新资讯前言对于大部分程序员来说,主要的工作时间是在开发和修复BUG。有可能修改了一个BUG,会导致几个新BUG的产生,不断循环。那么,有没有办法能够减少BUG,保证代码质量,提升工作效率?答案是肯定的。如果能做到,我们多出来的时间,多摸点鱼,做点自己喜欢
Python客栈
0
如何解释“有你这写脚本时间,我早就一个一个做完了”?
点击下方“JavaEdge”,选择“设为星标”第一时间关注技术干货!免责声明~任何文章不要过度深思!万事万物都经不起审视,因为世上没有同样的成长环境,也没有同样的认知水平,更「没有适用于所有人的解决方案」;不要急着评判文章列出的观点,只需代入其中,适度审视一番自己即可,能「跳脱出来从外人的角度看看现
JavaEdge
0
是谁还在坚持用 QQ?腾讯回应:好冷漠...
转自:电脑报近日,“仍有5亿人坚持用QQ”的话题登上微博热搜,引发网友热议。根据腾讯财报,截至2023年第三季度,QQ智能终端月活跃用户数为5.58亿,仅占微信四成。但换个角度看,作为一款25岁的元老级社交应用,QQ破5亿的月活仍然是很多社交App羡慕的存在,超过了微博和知乎总和。只是在用户增量上,
dotNET全栈开发
10
快手优选,用平台托管打出“源头直供”的爆发力
是新朋友吗?记得先点蓝字关注我哦~这场硬仗一定要打,也一定要打赢。文/调皮电商 冯华魁上个月底,云南红河蓝莓产业带某品牌,因为其产品果径大、品质好,价格远低于进口蓝莓,受快手平台邀请加入优选项目。快手优选采用销售托管业务模式,平台通过流量整合(商城公域、短视频、直播间、商业化流量等)及规模化运营,向
调皮电商
1
实际工程项目中是怎么用卡尔曼滤波的?
点击上方“小白学视觉”,选择加"星标"或“置顶”重磅干货,第一时间送达编辑 | 汽车人原文链接:https://www.zhihu.com/question/358334095回答一 作者:李崇链接:https://www.zhihu.com/question/358334095
小白学视觉
10
一天肝600多篇文章,用数量对抗算法的不确定性
我写公众号七八年,总原创文章数量也只不过是650多篇。写爆文的一天就干出600多篇,多少有点震惊。背后是公众号平台进行功能调整后,从一天只能发一次文章,变改成了一天可以无限制发任意数量的文章。做公众号爆文写作变现的底层逻辑是基于公众号算法调整,从订阅规则改成了推荐机制,人人都有机会获得系统的推荐流量
python之禅
0