Linear combination of nonstationary time series to get a stationary time series. One application is mean reversion trading of a pair of assets, or also baskets of assets that are separately interrelated like cryptocurrencies that are highly correlated. It can also be used for hedging purposes.
For non-stationary time series \(x_t\) and \(y_t\), if \(a x_t + b y_t\) is stationary, then they are cointegrated.
A time series (or stochastic process) is defined to be strongly stationary if its joint probability distribution is invariant under translations in time or space. the mean and variance of the process do not change over time or space and they each do not follow a trend.
\[V(\tau)=E\{[log(t+\tau)-log(t)]^2\}\sim \tau^{2H}\] A time series can then be characterized in the following manner: H < 0.5: The time series is mean reverting H = 0.5: The time series is a geometric Brownian motion H > 0.5: The time series is trending
A continuous mean-reverting time series can be represented by an Ornstein-Uhlenbeck stochastic differential equation: \[dx_t=\theta(\mu-x_t)dt+\sigma dW_t\] In a discrete setting the equation states that the change of the price series in the next time period is proportional to the difference between the mean price and the current price, with the addition of Gaussian noise. This property motivates the Augmented Dickey-Fuller Test. Mathematically, the ADF is based on the idea of testing for the presence of a unit root in an autoregressive time series sample. It makes use of the fact that if a price series possesses mean reversion, then the next price level will be proportional to the current price level. A linear lag model of order p is used for the time series: \[\Delta y_t=\alpha + \beta t + \gamma y_{t-1}+\delta_1 \Delta y_{t-1} + \cdots +\delta_{p-1} \Delta y_{t-p+1} + \epsilon_t\] If the hypothesis that \(\gamma=0\) can be rejected then the following movement of the price series is proportional to the current price and thus it is unlikely to be a random walk. \[DF_\tau=\frac{\hat \gamma}{SE(\hat \gamma)}\] If p-value less than 0.01 null hyp rejected so time series is stationary.
Time series \(x_t\) and \(y_t\) using the random walk \(z_t=z_{t-1}+w_t\): \[x_t=p z_t + w_{x,t}\] \[y_t = q z_t + w_{y,t}\] Taking a linear combination: \[a x_t + b y_t = (a p + b q) z_t + a w_{x,t} + b w_{y,t}\] So, if \(a p + b q = 0\), then we have a stationary time series.
If and \(y_t\) are non-stationary and order of integration d=1, then a linear combination of them must be stationary for some value of \(\beta\) and \(u_t\): \[y_t-\beta x_t=u_t\] Estimate \(\beta\) first by regressing y on x and run stationarity test like Dickey Fuller on the estimated residuals \(\hat u_t\). Then, regress first difference variables on the lagged residuals \(\hat u_{t-1}\). \[A(L) \Delta y_t=\gamma+B(L) \Delta x_t+\alpha\left(y_{t-1}-\beta_0-\beta_1 x_{t-1}\right)+\nu_t\] \[y_t=\beta_0+\beta_1 x_t+\varepsilon_t\] \[\hat{\varepsilon_t}=y_t-\beta_0-\beta_1 x_t\] \[A(L) \Delta y_t=\gamma+B(L) \Delta x_t+\alpha \hat{\varepsilon}_{t-1}+\nu_t\]
# clear R environment variables
remove(list=ls())
# load libraries
library("zoo")
library("urca")
# monthly prices of jet fuel and heating oil
<- read.zoo("JetFuelHedging.csv", sep = ",",
prices FUN = as.yearmon, format = "%Y-%m", header = TRUE)
# fit a linear model to explain jet fuel price change by heating oil price
# changes. the coefficient of the regression is the optimal hedge ratio.
# setting the intercept to zero, i.e., no cash holdings.
<- lm(diff(prices$JetFuel) ~ diff(prices$HeatingOil)+0)
simple_mod # summary(simple_mod)
cat('optimal hedge ratio: '); cat(unname(simple_mod$coefficients))
cat('residual standard error: '); cat(unname(summary(simple_mod)$sigma))
# plot prices
plot(prices$JetFuel, main = "Jet Fuel and Heating Oil Prices",
xlab = "Date", ylab = "USD")
lines(prices$HeatingOil, col = "red")
# augmented Dickey-Fuller test for unit root (non-stationarity)
<- ur.df(prices$JetFuel, type = "drift")
jf_adf summary(jf_adf)
<- ur.df(prices$HeatingOil, type = "drift")
ho_adf summary(ho_adf)
# non-stationarity cannot be rejected at 1% significance level
# estimate static equilibrium
<- summary(lm(prices$JetFuel ~ prices$HeatingOil))
mod_static <- residuals(mod_static)
error <- ur.df(error, type = "none")
error_cadf summary(error_cadf)
# very small p-value so stationary
# error correction model
<- diff(prices$JetFuel)
djf <- diff(prices$HeatingOil)
dho <- lag(error, k = -1)
error_lag <- lm(djf ~ dho + error_lag)
mod_ecm summary(mod_ecm)