Detrended Fluctuation Analysis (DFA) is a powerful method for analyzing long-range correlations in time series data. It is widely used in various fields, including finance, biology, and physics. This article will guide you through the DFA algorithm step by step, including the necessary formulas and code snippets to help you understand each stage of the process.
Overview
DFA aims to identify scaling properties of non-stationary time series. Unlike traditional methods, DFA can handle data with trends and non-stationarities. The core idea is to examine how fluctuations in the data vary with time scales.
Key Steps in DFA
Step 1: Cumulative Sum of the Data
The first step in DFA is to compute the cumulative sum of the time series data, which removes the mean from the data.
\[y(t) = \sum_{i=1}^{t} (x_i - \langle x \rangle)\]
where:
\(y(t)\) is the cumulative sum at time t.
\(x_i\) is the value of the time series at index i.
\(\langle x \rangle\) is the mean of the time series.
Python Code
import numpy as np
def cumulative_sum(x):
return np.cumsum(x - np.mean(x))
Step 2: Detrending the Data and Calculating Local Fluctuations
Next, we split the cumulative sum into non-overlapping segments of length n and perform linear regression on each segment to remove trends.
For each segment:
\[X = (x_1, x_2, \ldots, x_n) \quad \text{(data points in the segment)}\]
\[Y = \text{slope} \cdot t + \text{intercept} \quad \text{(fitted line)}\]
The detrended values are computed as:
\[x(t)=X(t)-Y(t)\]
The root-mean-square deviation from the local trend for each segment is calculated, which gives us the fluctuation function for that segment.
\[F(n, i) = \sqrt{\frac{1}{n} \sum_{t=in+1}^{in+n} \left( x_t - Y_t \right)^2}\]
where:
\(F(n, i)\) is the fluctuation for segment i of length n.
\(Y_t\) is the fitted value from the linear regression.
Python Code
import numpy as np
from numpy.lib.stride_tricks import as_strided
def calc_rms(x, scale):
= (x.shape[0] // scale, scale)
shape = as_strided(x, shape=shape)
X = np.arange(scale)
scale_ax = np.zeros(X.shape[0])
rms
for e, xcut in enumerate(X):
= np.polyfit(scale_ax, xcut, 1) # Linear regression
coeff = np.polyval(coeff, scale_ax) # Fitted values
xfit = np.sqrt(np.mean((xcut - xfit)**2)) # RMS of the detrended data
rms[e]
return rms
Step 3: Calculating Total Fluctuation
And their root-mean-square is the total fluctuation:
\[F(n)={\sqrt {{\frac {1}{N/n}}\sum _{i=1}^{N/n}F(n,i)^{2}}}\]
Python Code
def calculate_fluctuations(y, scales):
= np.zeros(len(scales))
fluct
for e, sc in enumerate(scales):
= np.sqrt(np.mean(calc_rms(y, sc)**2))
fluct[e]
return fluct
Step 4: Estimating the DFA Exponent
Finally, we fit a line to the logarithmic values of the fluctuation function versus the scale to estimate the DFA exponent α.
\[\log F(n) = \alpha \log n + \beta\]
Python Code
def dfa(x, scale_lim=[5, 9], scale_dens=0.25, show=False):
= cumulative_sum(x)
y = (2 ** np.arange(scale_lim[0], scale_lim[1], scale_dens)).astype(np.int)
scales = calculate_fluctuations(y, scales)
fluct
= np.polyfit(np.log2(scales), np.log2(fluct), 1)
coeff
if show:
'bo')
plt.loglog(scales, fluct, 2**np.polyval(coeff, np.log2(scales)), 'r', label=r'$\alpha$ = %0.2f' % coeff[0])
plt.loglog(scales, 'DFA')
plt.title(r'$\log_{10}$(time window)')
plt.xlabel(r'$\log_{10} <F(t)>$')
plt.ylabel(
plt.legend()
plt.show()
return scales, fluct, coeff[0]
A straight line of slope \(\alpha\) on the log-log plot indicates a statistical self-affinity of form \(F(n)\propto n^{\alpha }\). Since \(F(n)\) monotonically increases with \(n\), we always have \(\alpha>0\).
The scaling exponent \(\alpha\) is a generalization of the Hurst exponent, with the precise value giving information about the series self-correlations:
· \(\alpha <1/2\): anti-correlated
· \(\alpha \simeq ½\): uncorrelated, white noise
· \(\alpha >1/2\): correlated
· \(\alpha \simeq 1\): 1/f-noise, pink noise
· \(\alpha >1\): non-stationary, unbounded
· \(\alpha \simeq 3/2\): Brownian noise
Because the expected displacement in an uncorrelated random walk of length N grows like \(\sqrt {N}\), an exponent of \(\tfrac {1}{2}\) would correspond to uncorrelated white noise. When the exponent is between 0 and 1, the result is fractional Gaussian noise.
Complete Code Example
Combining all the parts, here’s the complete code for DFA applied to daily Bitcoin time series for the past 5 years:
import numpy as np
import matplotlib.pyplot as plt
from scipy.signal import hilbert
from numpy.lib.stride_tricks import as_strided
def cumulative_sum(x):
return np.cumsum(x - np.mean(x))
def calc_rms(x, scale):
= (x.shape[0] // scale, scale)
shape = as_strided(x, shape=shape)
X = np.arange(scale)
scale_ax = np.zeros(X.shape[0])
rms
for e, xcut in enumerate(X):
= np.polyfit(scale_ax, xcut, 1)
coeff = np.polyval(coeff, scale_ax)
xfit = np.sqrt(np.mean((xcut - xfit)**2))
rms[e]
return rms
def calculate_fluctuations(y, scales):
= np.zeros(len(scales))
fluct
for e, sc in enumerate(scales):
= np.sqrt(np.mean(calc_rms(y, sc)**2))
fluct[e]
return fluct
def dfa(x, scale_lim=[5, 9], scale_dens=0.25, show=False):
= cumulative_sum(x)
y = (2 ** np.arange(scale_lim[0], scale_lim[1], scale_dens)).astype(np.int)
scales = calculate_fluctuations(y, scales)
fluct
= np.polyfit(np.log2(scales), np.log2(fluct), 1)
coeff
if show:
'bo')
plt.loglog(scales, fluct, 2**np.polyval(coeff, np.log2(scales)), 'r', label=r'$\alpha$ = %0.2f' % coeff[0])
plt.loglog(scales, 'DFA')
plt.title(r'$\log_{10}$(time window)')
plt.xlabel(r'$\log_{10} <F(t)>$')
plt.ylabel(
plt.legend()
plt.show()
return scales, fluct, coeff[0]
if __name__ == '__main__':
import yfinance as yf
# Download BTC-USD data from yfinance
= yf.download('BTC-USD', period='5y')
data
# Calculate returns: (P_t - P_t-1) / P_t-1
= data['Close'].diff() / data['Close'].shift(1)
r =True)
r.dropna(inplace
= dfa(r, show=True)
scales, fluct, alpha print("Scales:", scales)
print("Fluctuations:", fluct)
print("DFA Exponent: {}".format(alpha))
We get \(\alpha = 0.57\), which indicates persistent
behavior. This means that trends tend to continue. In financial markets,
this suggests that an upward movement in price tends to be followed by
further upward movements, and vice versa for downward movements.
Conclusion
This article provided a comprehensive guide to Detrended Fluctuation Analysis (DFA). By following the outlined steps, you can analyze the scaling behavior of your time series data effectively. The accompanying code snippets allow you to implement DFA and visualize the results, aiding in understanding the long-range correlations in your data.