,

Time series with VTI – Part 1

In this series, we will investigate Vanguard Total Stock Market Index Fund (VTI). We will start with basic statistical methods such as ARIMA, SARIMA, and SSA to LSTM using PyTorch. We will use our model to predict the prices for the next five years – because why not. For the ARIMA model, we will go through the tedious method first, but we will use pmdarima in later post.

Honestly, I really wanted to work with aquaponics data, but I am not able to find good data on it, and I just don’t have the funds to construct one yet. I will just play with VTI and then work on an artificial aquaponics data in the near future.

We will be working with historical closing prices of VTI from 2001 to the present. The data is from Yahoo Finance.

First, we import a few statistical libraries, and load the data using Pandas.

import pandas as pd 
import numpy as np
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.arima.model import ARIMA

df = pd.read_csv("/lakehouse/default/" + "Files/VTI.csv", parse_dates=['Date'], index_col='Date')
vti_time = df[['Close']]
vti_time.plot()
plt.show()

We can clearly see the housing crisis from 2007-2009 and the pandemic from 2020-2021. Index funds such VTI is well known to be stable and high performing. It’s compound annual growth rate (CAGR) is 10%.

total_return = (vti_time.iloc[-1]-vti_time.iloc[0])/vti_time.iloc[0]
num_years = len(vti_time)/365
yearly_return = (1 + total_return) ** (1/num_years) - 1
print(yearly_return)

It’s probably better to look at quarterly data and perform seasonal decomposition. As expected, index funds have a long-term upward trend, but it has a lot of noise – a bunch of random walks with an upward trend. In the future, I will look at the seasonal more carefully using SSA and Fourier Transform. Interestingly, when I plotted ACF with the VTI monthly difference, there is a significant peak at lag 13.

vti_group = vti_time.resample('3M').mean()
decomp = seasonal_decompose(vti_group,period=4)
decomp.plot()
plt.show()

Let’s plot both ACF and PACF of the difference of our quarterly time series. Each plot has significant correlations at lag 1 and at another higher lag. This suggests p, q > 1 in the ARIMA model. A quick test with adfuller shows that vti_diff is stationary with p-value much smaller than 0.05.

fig, (ax1, ax2) = plt.subplots(2,1, figsize=(12,8))
plot_acf(vti_diff,lags=20,zero=False,ax=ax1)
plot_pacf(vti_diff,lags=20,zero=False,ax=ax2)
plt.show()

Let’s try to find the best p and q for our ARIMA model. We will run each p and q to 4. Save all of the (p,q,aic,bic) into a pandas DataFrame and sort by AIC. We obtain p=3 and q=2. The diagnostics seem relatively healthy. The residuals are almost normal. Note that BIC always suggest the simplest model – in this case, it’s way too simple for VTI.

max_lag=5
aic_bic_list = []
for p in range(max_lag):
for q in range(max_lag):
try:
model = ARIMA(vti_group['Close'], order=[p,1,q])
results = model.fit()
aic_bic_list.append((p,q,results.aic,results.bic))
except:
print(p, q, None, None)
aic_bic_df = pd.DataFrame(aic_bic_list,columns=['p','q','aic','bic'])
model = ARIMA(vti_group,order=[3,1,2])
results = model.fit()
results.summary()
results.plot_diagnostics()
plt.tight_layout()
plt.show()

Finally, let’s make some TERRIBLE forecast. Interesting that there is a slight downward trend in the next five years.


dynamic_forecast = results.get_forecast(steps=20)
mean_forecast = dynamic_forecast.predicted_mean
confidence_intervals = dynamic_forecast.conf_int()
lower_limits = confidence_intervals.loc[:,'lower Close']
upper_limits = confidence_intervals.loc[:,'upper Close']
plt.plot(vti_group.index, vti_group, label='observed')
plt.plot(mean_forecast.index, mean_forecast, color='r', label='forecast')

plt.fill_between(lower_limits.index, lower_limits,
upper_limits, color='pink')
plt.xlabel('Date')
plt.ylabel('VTI Monthly Close Price')
plt.legend(loc='upper left')
plt.show()

Tags:

Leave a comment