Description
I'm experimenting conformal prediction for in-sample forecasting over high-frequency time data (epoch=5mins) retrieving prediction interval using predict_interval()
. The size of uni-variate (1D) time-series data is (8640, 1)
meaning a single column value over time for 8460 time steps. I have applied different regressors to forecast tail of time data (the last 7 days). I'm usingForecasterRecursive()
according to our discussion with @JoaquinAmatRodrigo in this issue#43
runtime_train = []
runtime_predict = []
for i in range(len(name_model)):
forecaster = ForecasterRecursive(
regressor=models[i],
lags=lags
)
start_time = datetime.now()
forecaster.fit(y=data_train_long[name_columns])
runtime_t = datetime.now() - start_time
runtime_train.append(runtime_t)
#start_time = datetime.now()
y_train_pred = forecaster.predict(steps=len(data_train_long), last_window=None)
last_window = data_test_long[name_columns].tail(steps)
start_time = datetime.now()
y_test_pred=forecaster.predict_interval(
steps=steps,
interval=[1, 99],
n_boot=100,
last_window=last_window)
y_test_pred.index = data_test_long.index
y_test_pred.to_excel( f"{result_dir}/y_test_pred_in_{name_model[i]}.xlsx")
y_train_pred.to_excel(f"{result_dir2}/y_train_pred_in_{name_model[i]}.xlsx")
runtime_p = datetime.now() - start_time
runtime_predict.append(runtime_p)
print(f"{name_model[i]} is done.")
print("*"*30)
I have used following to calculate coverage
to compare models forecast results:
from mapie.metrics import regression_coverage_score
coverage = regression_coverage_score(
data_test[name_columns],
y_test_pred - std* 1.96,
y_test_pred + std* 1.96
)
following are the results mainly coverage=1.0 (maybe rounded up when they display in box) :
Yesterday, I saw a new post on point for me on LinkedIn from @JoaquinAmatRodrigo from SKforecast group, about some improvements achieved for better Coverage.
I checked the comment and example notebook and noticed following calculation for coverage
:
# Predicted interval coverage (on test data)
# ==============================================================================
def empirical_coverage(y, lower_bound, upper_bound):
"""
Calculate coverage of a given interval
"""
return np.mean(np.logical_and(y >= lower_bound, y <= upper_bound))
coverage = empirical_coverage(
y = data.loc[end_validation:, 'OT'],
lower_bound = predictions["lower_bound"],
upper_bound = predictions["upper_bound"]
)
print(f"Predicted interval coverage: {round(100 * coverage, 2)} %")
# Area of the interval
# ==============================================================================
area = (predictions["upper_bound"] - predictions["lower_bound"]).sum()
print(f"Area of the interval: {round(area, 2)}")
I also see one post here they deploy other wrapper and use different way to calculate as follows:
#!pip install statsforecast
import pandas as pd
from statsforecast import StatsForecast
from statsforecast.models import SeasonalNaive
from statsforecast.utils import ConformalIntervals
train = pd.read_csv('https://auto-arima-results.s3.amazonaws.com/M4-Hourly.csv')
test = pd.read_csv('https://auto-arima-results.s3.amazonaws.com/M4-Hourly-test.csv').rename(columns={'y': 'y_test'})
n_series = 8
uids = train['unique_id'].unique()[:n_series] # select first n_series of the dataset
train = train.query('unique_id in @uids')
test = test.query('unique_id in @uids')
models = [SeasonalNaive(25)]
sf = StatsForecast(
models=models,
freq=1,
n_jobs=-1)
levels = [80] # confidence levels of the prediction intervals
forecasts = sf.forecast(df=train, h=48, level=levels)
plot_test = test.merge(forecasts, how='left', on=['unique_id', 'ds'])
train_test = pd.concat([train, test.rename(columns={'y_test':'y'})])
# coverage
((plot_test['SeasonalNaive-lo-80']<=plot_test['y_test']) & (plot_test['y_test']<=plot_test['SeasonalNaive-hi-80'])).mean()
# 0.9192708333333334
Can you help me to diagnose taht what is wrong and why most of models I used using skforecast
with even not good prediction interval that can follow dynamic of test data but they return coverage=1.000?
I'm confused a bit.
Thanks in advance