concepts
examples
descriptive techniques
some linear time series models
model building strategy
forecasting (univariate)
- extrapolation & exponential smoothing for non-seasonal data
- box-jenkins forecasting procedures

Data obtained from observations collected sequentially over time are extremely common [1].

Time series analysis is the effort of extracting insightful summary and statistical information from observations arranged in chronological order [2].

several possible objectives in analysing a time series [2]:

description
- look for outliers; robust methods to be insensitive to outliers are needed
- sudden or gradual changes in the series: seasonal changes, step changes, turning points
explanation
- use one time series to explain another
prediction
- use existing time series data to predict future obs. of the time series
control
- assess and maintain/control performances of processes

concepts

deterministic: if a time series can be predicted exactly
stochastic: a sequence of random variables $Y_t: t=0, 1,2, \dots$
- future obs. are partially determined by past obs.
- so that future values have a probabilistic distribution conditioned on past values.
autocovariance function, $\gamma_{t,s}=Cov(Y_t, Y_s)$ for time point t and s
- $\gamma_{t,t}=Var(Y_t)$
- $\gamma(t,s)=\gamma(s,t)$
- $|\gamma_{t,s}|\le \sqrt{\gamma_{t,t} \gamma_{s,s}}$
- $\gamma_k=\frac{\sum_{t=1}^{N-k}(y_t-\bar{y})(y_{t+k}-\bar{y}))}{N}, k=0,1,2,\dots$ gives the autocovariance coefficients at lag k between obs. that are k steps apart.
autocorrelation function, $\rho_{t.s}=Corr(Y_t, Y_s)$ for time point t and s
- $\rho_{t,t}=1$
- $\rho_{t,s}\le 1$
- $\rho_1=\frac{\sum_{t=1}^{N-1}(y_t-\bar{y})(y_{t+1}-\bar{y}))}{\sum_{t=1}^{N}(y_t-\bar{y})^2}$
- $\rho_k=\frac{\sum_{t=1}^{N-k}(y_t-\bar{y})(y_{t+k}-\bar{y}))}{\sum_{t=1}^{N}(y_t-\bar{y})^2}, k=0,1,2,\dots$ gives the sample autocorrelation coefficient at lag k between obs. that are k steps apart.
- $\rho_k=\gamma_k/\gamma_0$
time series classification
- continuous: obs. are made continuously even the response takes only a discrete set of values
- discrete: obs. are taken at specific times, such as equally spaced, even the response is continuous
  - arises in several ways: sampled at equal time intervals; aggregate/accumulate the values over equal intervals of time such as daily rainfalls
assumptions to simplify time series data structure
- strickly stationarity
  - if the joint distribution of $Y_{t_1}, Y_{t_2}, \dots, Y_{t_n}$ is the same as the join distribution of $Y_{t_1-k}, Y_{t_2-k}, \dots, Y_{t_n-k}$ for all time points $t_1, t_2, \dots, t_n$ and time lag k.
- weakly stationary

examples

first example: let $e_1, \dots, e_n$ be a sequence of iid random variables with mean 0 and variance $\sigma^2$
- construct a time series as follows:
\[ \begin{aligned} Y_1&=e_1 \\ Y_2&=e_1+e_2 \\ \vdots \\ Y_t &= e_1+e_2+\dots+e_t \end{aligned} \] or $Y_t=Y_{t-1}+e_t$ with initial condition $Y_1=e_1$. here we have $E[Y_t]=0, var[Y_t]=t\sigma^2$ where variance increases linearly with time t. covariance for $1 \le t \le s$, $\gamma_{t,s}=Cov(Y_t, Y_s)=\sum_{i=1}^{s}\sum_{j=1}^{t}Cov(e_i, e_j)=t\sigma^2$; the autocorrelation for the random walk is given as $\rho_{t,s}=\frac{\gamma_{t,s}}{\sqrt{\gamma_{t,t}\gamma_{s,s}}}=\frac{t\sigma^2}{\sqrt{t\sigma^2\cdot s\sigma^2}}=\sqrt{\frac{t}{s}}$ for $1 \le t \le s$.
- realization of the auto-correlations:
\[ \rho_{1,2}=\sqrt{\frac{1}{2}}=0.707, \rho_{8,9}=\sqrt{\frac{8}{9}}=0.943 \\ \rho_{24,25}=\sqrt{\frac{24}{25}}=0.980, \rho_{1,9}=\sqrt{\frac{1}{9}}=0.333 \] the obs. of Y at neighboring time points are more and more positively correlated as time goes by; the obs. of Y at distant time points are less and less correlated.

The simple random walk process provides a good approximation for phenomena as diverse as the movement of common stock price, and the movement of small particles in a fluid—so-called Brownian motion.

second example: a moving average

let $\{Y_t\}$ be consisted if $Y_t=\frac{e_t+e_t-1}{2}$ where $e_i \overset{iid}{\sim} f(0, \sigma^2)$. $\mu_t=E[Y_t]=\frac{E(e_t)+E(e_{t-1})}{2}=0, var(Y_t)=\frac{\sigma^2}{2}, \gamma_{t,s}=\begin{cases}0.5 \sigma^2, |t-s|=0 \\ 0.25\sigma^2, |t-s|=1 \\ 0, |t-s|>1\end{cases}$ for all t, and $\rho_{t,s}=\begin{cases}1, |t-s|=0 \\ 0.5, |t-s|=1, \\ 0, |t-s|>1\end{cases}$.

note that $\rho_{2,1}=\rho_{3,2}=\rho_{4,3}=\rho_{9,8}=0.5$. $\rho_{3,1}=\rho_{4,2}=\rho_{t,t-2}$, and $\rho_{t,t-k}$ is the same for all t.

third example

The classic Beveridge wheat price index series consists of the average wheat price in nearly 50 places in various countries measured in successive years from 1500 to 1869.

descriptive techniques

variation decomposition
- trend: long-term change in the mean level
  - stationary time series: if there is no systematic change in the mean (no trend), if there is no systematic change in variance, and if periodic variations have been removed.
- seasonal variation: seasonal changes in sales data for example
- other cyclic changes: changes occur at a fixed period, economic data has 3-4 years cyclic period.
- remaining irregular fluctuations: residuals in addition to trend and cyclic variations, that might or might not be random.
time plot
- trend
- seasonality
- outliers
- discontinuities
trend without seasonality techniques
- curve fitting: polynomial curve, logistic curve, and so on
- linear filter:converts one time series into another by linear operations $y_t=\sum a_rx_{t+r}$ where $\sum a_r=1$ and $\{a_r\}$ is a set of weights.
- filters in series: smoothing carried out in two or more stages
- differencing
trend with seasonal variation techniques
- additive/multiplicative seasonal effect
autocorrelation and the correlogram
- correlogram: the sample autocorrelation coefficients $\rho_k$ are plotted against the lag k.
- interpreting the correlogram
  - random series: $\rho_k \approx 0, k=1,2, \dots$
  - short-term correlation
  - alternating series
  - non-stantionary series: contains a trend
  - seasonal series

transformation
- stabilize the variance: transform when variance tend to increase with the mean
- make seasonal effect additive
- make data normally distributed
data cleaning
- modify outliers
- handle missing obs.

some linear time series models

a particular realization of a stochastic process gives observed time series. time-series analysis is essentially concerned with evaluating the properties of the underlying probabilistic model from the realized time series.

many models for stochastic processes are expressed by means of an algebraic formula relating the random variable at time t to past values of the process, together with residuals. first and second moment, autocovariance and autocorrelation function of the process are given in concepts.

stationary processes

stationary are a important class of stochastic process:

strictly stationary: if the joint distribution of $Y(t_1), \dots, Y(t_k)$ is the same as that of $Y(t_1+\tau), \dots, Y(t_k+\tau)$ for all time points and $\tau$. for example, $k=1$, stationary time series has obs. at all time times from iid random variables.
weakly stationary: No requirements are placed on moments higher than second order.
- normal processes

properties of autocorrelation function

$\rho(\tau)=\rho(-\tau))$: $Cov[Y(t), Y(t+\tau)]=Cov[Y(t-\tau),Y(t)]$
$|\rho(\tau)| \le 1$
the autocorrelation function des not uniquely define the underlying probabilistic model

purely random processes

a discrete-time process is called purely random if it consists of a sequence of iid random variables $Y(t)$. we further assume $Y(t) \sim N(0, \sigma^2)$. Thus, $\gamma(k)=Cov(Y(t), Y(t+k))=\begin{cases}\sigma^2, k = 0 \\ 0,k \ne 0\end{cases}, \rho(k)=\begin{cases} 1, k=0 \\ 0, k \ne 0\end{cases}$. purely random process is also weakly and strictly stationary as the mean and autocorrelation function do not rely on time.

random walks

suppose the sequence $\{Y(t)\}$ is a discrete-time, purely random process and $Y(t) \sim f(\mu, \sigma^2)$. $X(t)$ is a random walk if $X(t)=X(t-1)+Y(t)$. We let $Y(0)=0$, thus we have $X(1)=Y(1)$ and $X(t)=\sum_{i=1}^{t}Y(i), E[X(t)]=\mathbf{t}\mu, var[X(t)]=\mathbf{t}\sigma^2$. We see that the mean and variance depend on time $\mathbf{t}$, thus the random walks processes are not stationary.

a good example for random walks is share prices on successive days: share price on day t = share price on day (t-1) + random error.

moving average process

suppose the sequence $\{Y(t)\}$ is a discrete-time, purely random process, i.e., $Y(t) \sim f(0, \sigma^2)$ and $Y(t) \sim f(\mu, \sigma^2)$. $X(t)$ is a moving average process of order $q$ if $X(t)=\beta_0Y(t)+\beta_1Y(t-1)+\dots+\beta_q Y(t-q)$ where $\{\beta_i\}$ are constants. The $\{Y\}$s are usually scaled such that $\beta_0=1$.

$E[X(t)]=E[\beta_0Y(t)+\beta_1Y(t-1)+\dots+\beta_q Y(t-q)]=0, var[X(t)]=\sigma^2 \sum_{i=0}^{q}\beta_i^2$.

\[ \rho(k)= \begin{cases} 1, & k=0 \\ \sum_{i=0}^{q-k}\beta_i\beta_{i+k}/\sum_{i=0}^{q}\beta_i, & k=1,\dots, q \\ 0, & k>q \\ \rho(-k), & k < 0 \end{cases} \]

the autocovariance/autocorrelation function does not depend on time t and the mean in constant, the moving average (MA) process is second-order stationary; if $Y(t)$s are normally distributed, then the MA process is strictly stationary normal process.

simulate a MA(1) white noise process, $Y(t) \sim N(0,1), X(t)=Y(t)-0.8Y(t-1)$, and MA(2) process, $X(t)=Y(t)+0.7Y(t-1)-0.2Y(t-2)$:

no requirements on the $\{\beta_i\}$ for a MA process to be stationary, but generally it is desirable to impose some restrictions so that the process is invertable. theimposition of the invertibility condition ensures that there is a unique ivertibke first-order MA process for a given autocorrelation function.

autoregressive processes

let the sequence $\{Y(t)\}$ is a discrete-time, purely random process, i.e., $Y(t) \sim f(0, \sigma^2)$ and $Y(t) \sim f(\mu, \sigma^2)$. $X(t)$ is a autoregressive process of order $p$, AR(p), if $X(t)=\alpha_1X(t-1)+\dots+\alpha_pX(t-p)+Y(t)$.

first-order AR process: $X(t)=\alpha_1X(t-1)+Y(t)$, AR(1) process is also referred to as the Markov process.

\[ \begin{aligned} X(t) &= \alpha X(t-1)+Y(t) \\ &= \alpha(\alpha X(t-2)+Y(t-1))+Y(t) \\ &= \alpha[\alpha (\alpha X(t-3)+Y(t-2))+Y(t-1)]+Y(t) = \alpha^3X(t-3)+\alpha^2Y(t-2)+\alpha Y(t-1) +Y(t) \\ &=Y(t) + \alpha Y(t-1) + \alpha^2Y(t-2)+\dots \end{aligned} \] where $-1 < < 1 $. $E[X(t)]=0, var[X(t)]=\sigma^2 (1+\alpha^2 + \alpha^4 + \cdots)$. $\gamma(k)=\alpha^k\sigma^2/(1-\alpha^2)$, which does not depend on time t, thus an AR(1) process is second-order stationary provided that $|\alpha|<1$.

three AR(1) examples are given below with $\alpha=0.8, -0.8, 0.3$:

Note how quickly the ac.f. decays when $\alpha = 0.3$, and note how the ac.f. alternates when $\alpha < 0$.

AR processes have been applied to many situations in which it is reasonable to assume that the present value of a time series depends linearly on the immediate past values together with a random noise.

mixed AR-MA models

combining AR and MA processes gives rise to amixed autoregressive/moving-average process with p AR terms and q MA terms, denoted by ARMA(p, q), and formulated by : \[ X(t)= \alpha_1X(t-1)+\dots+\alpha_pX(t-p)+Y(t)+\beta_1Y(t-1)+\dots+\beta_q Y(t-q) \]

importance o ARMA model lies in the fact that a stationary time series can be well modelled by an ARMA model with fewer parameters than a pure AR or MA process. [Principle of Parsimony]

consider ARMA(1,1) and ARMA(2,2):

integrated ARMA process (ARIMA)

non-stationary time seris are the most common in practice. To use the stationary models mentioned above, we need to first remove non-stationary sources of variation.

if the observed time series is non-stationary in the mean, we can
- difference the series
- fit a stationary model to the differenced data

such as model is called an “integrated” model since the stationary model fitted to the differenced data has to be summed or integrated to provide a adequate model for the original non-stationary data, writing as $W(t)=\Delta^dX(t)=(1-B)^dX(t)$.

the general autoregressive integrated moving average (ARIMA) process takes the form of

\[ W(t)=\alpha_1W(t-1)+\dots+\alpha_pW(t-p)+Y(t)+\beta_1Y(t-1)+\dots+\beta_qY(t-q) \] ARIMA models are commonly used to remove a trend from the data. if $d=0$, ARIMA(p,d,q) models are an ARMA(p, q) process. An example of ARIMA(1,1,1) is given below:

Note that the ac.f. of the two integrated processes is slowly decaying because of the trend. ARIMA models can be generalized to include seasonal terms.

fractional differencing and long-memory models

an variant of ARIMA models arises with the use of fractional differencing, leading to fractional integrated ARMA model (ARFIMA). ARFIMA models is exactly the same as an ARIMA(p,d,q) model, except that d is no longer restricted to being an integer.

if $0 < d < 0.5$, a ARFIMA model is of particular interest because such a process is not only stationary, but is also a long-memory model. By long-memory, we try to express the speed at which the autocorrelation decays is very slow. For most stationary time-series models, the autocorrelation function decreases fairly fast, such as AR(1) models decreases at exponential speed. For long-memory model, however, the correlation decay to 0 very slowly, implying that obs. far apart are still correlated.

## Loading required package: fracdiff

## Warning: package 'fracdiff' was built under R version 4.3.3

## [1] "series"  "ar"      "ma"      "d"       "mu"      "n.start"

A long-memory (stationary) process and a non-stationary process: distinguishing a stationary, long-memory process from a non-stationary process is difficult in practice since both both have a autocorrelation function that dies out slowly and large values at zero frequency in the spectrum.

model building strategy

Wold Decomposition Thm

any discrete-time stationary process can be expressed as the sum of two uncorrelated processes, one deterministic and another indeterministic:

deterministic: $\lim_{q \rightarrow \infty} \tau^2_q = 0$
indeterministic: $\lim_{q \rightarrow \infty} \tau^2_q = var[X(t)]$, e.g., AR, MA, ARMA processes

model build

model specification: choose the classes of time series models that are appropriate for a given data series by looking at the time series plot and data collection processes
- tentative and subject to revision
- principle of parsimony
model building: parameter estimation considering criteria such as least squares and ML
model diagnostics: assessing quality of the model specified and estimated

model specification: fitting a suitable model to an observed time series

estimating autocovariance and autocorrelation functions

the theoretical autocorrelation function is an important descriptive tool for the properties of a stationary stochastic process.

provided the time series is stationary, sample autocorrelation function is an intuitive estimator of the theoretical autocorrelation function.

given a stationary process $\{x_1, \dots,x_N\}$, the sample autocovariance coefficient at lag k is given by

\[ c_k=\sum_{t=1}^{N-k}(x_t-\bar{x})(x_{t+k}-\bar{x})/N \]

which is an intuitive estimator for $\gamma(k)$ at lag k. however, this estimator is biased $E[c_k] \ne \gamma(k)$. but $\lim_{N\rightarrow \infty}E[c_k]=$, thus the sample autocovariance function is asymptocally unbiased.

to estimate the autocorrelation function, we take $r_k=c_k/c_0$ as an estimator for $\rho(k): r_k \rightarrow \rho(k)$. but $r_k$ is generally biased and we only consider it when sampling from a purely random process where all theoretical autocorrelation coefficients are 0 except at lag 0.

\[ E[r_k] \approx -1/N, var[r_k] \approx 1/N \] - assess randomness of a stochastic process

$r_k \sim N(\mu, \sigma^2)$ as $N \rightarrow \infty$ under weak conditions. plotting out $\pm\frac{1.96}{N}$ with the autocorrelation coefficients can help determined if the process is truely random. Note that this is at $\alpha=5\%$ level and as we plot more coefficients, say 20, we expect 1 coefficient will fall out of the range and be significant. If we observed more than 1 significant coefficients, then we say the process is not random.

identify a suitable class of models for a given time series
- A correlogram with values of autocorrelation coefficients that do not come down to 0 quickly, indicates non-stationarity and that the time series need to be differenced.
- for stationary time-series, compare the correlogram to that of the theoretical autocorrelation plot of various ARMA processes to choose the best model.
  - MA(q): “cuts off”
    - $r_1$ is significantly different from 0 but the rest $r_k$s are all close to 0, then MA(1) is good
  - AR(p): die out slowly/attenuate
    - $r_1, r_2,r_3, \dots,$ appear to be decreasing slowly, a AR(1) model is good
  - mixed ARMA: attenuate

the correlogram is also helpful in trying to identify appropriate models for a given time series realization. in particular, it is good for selecting the most appropriate type of auto-regressive integrated moving average (ARIMA) model.

estimate the mean

although sample mean is used when estimating sample autocovariance and autocorrelation functions, sample mean is a potentially misleading statistic unless all systematic components have been removed. thus sample mean should only be considered for data to have come from a stationary process.

fitting an autoregressive process

after estimating the autocorrelation function of a given time series, next step is to decide upon which model to use as a suitable model to the stochastic process.

if an AR process seem convincing, then there are two related questions:

what is the order of the process?
how can we estimate the parameter of the process

estimate parameters of an AR process

consider a AR(p) process with a mean $\mu$:

\[ X(t)-\mu=\alpha_1(X(t-1))+\dots+\alpha_p (X(t-p)-\mu)+Y(t) \]

given N obs. $(x_1, \dots, x_N)$, the parameters $\mu, \alpha_1, \dots, \alpha_p$ can be estimated via least-square estimators by minimizing $s=\sum_{t=p+1}^{N}[X(t)-\mu-\alpha_1(X(t-1)-\mu)-\dots-\alpha_p (X(t-p)-\mu)]^2$ w.r.t. $\mu, \alpha_1, \dots, \alpha_p$. if $Y(t)$ is normally distributed, then the least squares estimates are also maximum likelihood estimates given that the first p values in the time series being fixed.

determining the order of an AR process

it is generally hard to determine the order of an AR process based on solely on the sample autocorrelation functions.

for a first-order AR process, the theoretical autocorrelation function decreases exponentially and the same functions ought to have a similar shape
for higher-order processes, theoretical autocorrelation functions display a mixture of damped exponential or sinusoidal functions and thus hard to identify
it is possible to plot out residual sum of squares against values of p and determined the value of p where addition of extra parameters gives little improvement in fit.
plotting partial autocorrelation coefficient against p, gives the partial autocorrelation function.
- the partial ac.f. of an AR(p) process ‘cuts off’ at lag p
- the partial ac.f. of an MA process will generally attenuate

example: using above procedure to fit an AR process to two simulated series AR(2) & AR(3):

\[ \begin{aligned} X(t) &=-0.4X(t-1)+0.3X(t-2)+Y(t), \quad Y(t) \sim N(0,1) \\ X(t) &=0.25X(t-1)+0.5X(t-2)-0.125X(t-3)+Y(t), \quad Y(t) \sim N(0,1) \end{aligned} \]

Note that both sample ac.f.’s decay rapidly, and the sample partial ac.f.’s cut off at lags 2 and 3, respectively. parameters of an AR process can be estimated in R using arima() function:

## 
## Call:
## arima(x = x1, order = c(2, 0, 0))
## 
## Coefficients:
##           ar1     ar2  intercept
##       -0.4379  0.2483     0.0404
## s.e.   0.0434  0.0434     0.0363
## 
## sigma^2 estimated as 0.9347:  log likelihood = -692.85,  aic = 1391.7

the fitted 2nd-order process is given as: $X(t) =0.0404-0.4379 \cdot X(t-1)+0.2483 \cdot X(t-2)+Y(t),\quad Y(t) \sim N(0,.9347)$.

fitting a MA process

suppose a MA process is thought to be a suitable model for a given time series, we then have two problems:

finding the order of the process
estimating the parameters of the process

estimating parameters

consider MA(1) process with mean $\mu$: $X(t)-\mu=Y(t)+\beta_1Y(t-1)$ where $\mu, \beta_1$ are to be estimated and $y(t)$ denotes a purely random process.

explicit least-squares estimates cannot be found; possible solution would be calculate sum of squares based on some starting values of the parameter recursively and determine the values of parameters that minimize $\sum Z(t)^2$.

determining th =r order of an MA process

if a MA process is suitable to a time series, the order of the process is usually evident from the sample autocorrelation function. the theoretical autocorrelation function of a MA(q) process has a simple form in that it cuts off at lag q.

an example of using the above procedure to fit a MA process to two simulated series:

\[ \begin{aligned} X(t) & = Y(t)-0.5Y(t-1)+0.4Y(t-2), \quad Y(t)\sim N(0,1) \\ X(t) & = Y(t)+\frac{1}{3}Y(t-1)+\frac{1}{5}Y(t-2)-\frac{1}{9}Y(t-3), \quad Y(t)\sim N(0,1) \end{aligned} \]

Depending on whether one assumes the mean of the series is zero or not, parameters of an MA process with zero or nonzero mean can be estimated using:

## 
## Call:
## arima(x = x1, order = c(0, 0, 2))
## 
## Coefficients:
##           ma1     ma2  intercept
##       -0.4216  0.3597     0.0405
## s.e.   0.0428  0.0387     0.0424
## 
## sigma^2 estimated as 1.021:  log likelihood = -714.92,  aic = 1435.83

the estimated model is $X(t) = 0.0405 + Y(t)-0.4216Y(t-1)+0.3597Y(t-2), \quad Y(t)\sim N(0,1.021)$.

fitting ARMA model

ARMA follows a similar iterative procedure for parameter estimation. residual sum of squares can be calculated at every point on a grid of parameter values, and whichever gives the minimum sum of squares is the combination we need.

consider a ARMA(1,0,1) process whose theoretical autocorrelation function decreases exponentially after lag 1:

\[ \begin{aligned} X(t)-\mu & =\alpha_1(X(t-1)-\mu)+Y(t)+\beta_1Y(t-1) \Rightarrow \\ Y(t) &= X(t)-\mu-\alpha_1(X(t-1)-\beta_1Y(t-1) \end{aligned} \] given N obs. we guess on the values for $\hat{\mu}, \hat{\alpha_1}, \hat{\beta_1}$, and then calculate the residuals recursively:

\[ Y(t) = X(t)-\hat{\mu}-\hat{\alpha}_1[X(t-1)-\hat{\mu}]-\hat{\beta}_1Y(t-1) \] the residual sum of squares $\sum_{t=1}^{N}Y^2(t)$ can be calculated and compared until minima is realized.

an example of ARMA(2,2) process with N=500 and $Y(t) \sim N(0, 1)$ is given as:

\[ X(t)=0.4X(t-1)-0.3X(t-2)+Y(t)-0.6Y(t-1)+0.5Y(t-2), Y(t)=N(0,1) \]

## 
## Call:
## arima(x = x5, order = c(1, 0, 1))
## 
## Coefficients:
##           ar1     ma1  intercept
##       -0.2686  0.0139     0.0524
## s.e.   0.1291  0.1302     0.0352
## 
## sigma^2 estimated as 0.9686:  log likelihood = -701.51,  aic = 1409.03

## 
## Call:
## arima(x = x5, order = c(2, 0, 2), include.mean = F)
## 
## Coefficients:
##          ar1      ar2      ma1     ma2
##       0.2198  -0.3710  -0.4722  0.5362
## s.e.  0.1695   0.1503   0.1579  0.1294
## 
## sigma^2 estimated as 0.9511:  log likelihood = -697.03,  aic = 1402.05

the fitted ARMA(2,2) model with mean 0 is: $X(t)=0.2198X(t-1)-0.3710X(t-2)+Y(t)-0.4722Y(t-1)+0.5362Y(t-2)$, and the 95% confidence intervals cover the true values of the coefficients, suggesting the fitted model is not very different from the true model.

model identification tools

two standard tools used to identify an appropriate ARMA model for a given stationary time series:

sample autocorrelation function
sample partial autocorrelation function
inverse autocorrelation function which has similar properties as the partial ac.f. of the ARMA process in that it ‘cuts off’ at lag p for an AR(p) process but generally dies out slowly for MA and ARMA processes.

the above model selection tools are subjective by matching an suitable model to the observed characteristics.

objective tools are:

hypothesis testing
- test null hypotheses such as normality
model-selection criterion
- Akaike’s Information Criterion (AIC)
- Bayesian Information Criterion (BIC): penalizes addition parameters more severely than AIC
- autoregressive transfer function criterion (CAT) for determining the order of an AR process
- final prediction error (FPE) for determining the order of an AR process

an example is given below:

## Warning in stats::arima(x = x, order = order, seasonal = seasonal, xreg = xreg,
## : possible convergence problem: optim gave code = 1

##          [,1]     [,2]     [,3]     [,4]
## [1,] 1438.120 1410.682 1405.330 1402.834
## [2,] 1407.041 1409.029 1405.072 1404.575
## [3,] 1409.020 1409.781 1402.372 1403.857
## [4,] 1405.134 1406.756 1403.851 1405.852

low AIC is preferred; p=q=2 here has the lowest AIC.

testing for unit roots

in practise, it is hard to distinguish non-stationary and nearly non-stationary time series. the sample ac.f. of non-stationary and nearly non-stationary processes will both decay (very) slowly towards zero.

Long-memory stationary processes are ‘nearly non-stationary’. for example, the (stationary) AR(1) process with parameter 0.95, $X(t)=0.95X(t-1)+Y(t)$, will look much like in short-term series the data generated by a (non-stationary) random walk, $X(t)=X(t-1)+Y(t)$. both time plots and autocorrelation functions look similar.

model misspecification generally does not matter for short-term series; but it generates substantial differences for long-term forecast. thus, it is important to distinguish between the two types of processes.

solution:

whether the differencing parameter d is exactly equal to one
- if the original data are non-stationary but the first differences are stationary, then a unit root is said to be present.
- if d=1, then a unit root is present
- augmented Dickey–Fuller test with null: there is a unit root, or d=1.

## 
##  Augmented Dickey-Fuller Test
## 
## data:  y1
## Dickey-Fuller = -3.9493, Lag order = 7, p-value = 0.01154
## alternative hypothesis: stationary

Unfortunately, tests for unit roots generally have poor power, even for moderate size samples.

estimating parameters of ARIMA models

many time series are clearly non-stationary. thus we difference an observed time series until it appears to be stationary, and then we fit an AR, MA, or ARMA model to the differenced series.

generally, first-order differencing of non-seasonal data is adequate, while second-order differencing is sometimes needed.for seasonal time-series, seasonal differencing may also be needed.

the possible use of differencing is the only change from fitting ARMA models:

seasonal ARIMA models: SARIMA

many time series contain a seasonal periodic component, which repeats every s obs. generalized ARIMA model (SARIMA) can deal with seasonality, denoted as SARIMA(p,d,q) x (P,D,Q)s.

The sale series demonstrates both a trend and seasonal variation. next shows how to use Box-Jenkins models to fit a time series model to the data that contains seasonal component and trend component:

First, regardless of the trend effect, we find that the sale series shows cyclic behavior every year.
- since the data was collected quarterly, it is reasonable to conclude the priod of the seasonal effect is $d=4$.
- seasonal difference: consider $\Delta_d=1-B^d, d=4$
- obtain seasonal differenced series $Y(t)=\Delta_dX(t)=(1-B^d)X(t), d=4$

Note that the differenced series does not show any seasonal effect in the time series plot; the sample ac.f. and sample partial ac.f. of Y(t) are significantly less than those of the original series X(t).

We then try to fit an ARMA model to Y(t). Since the order of the ARMA model is not given, we fit a set of ARMA models and compute their AICs to select the best order:

##          [,1]     [,2]     [,3]     [,4]     [,5]
## [1,] 1648.682 1650.504 1651.521 1648.854 1635.755
## [2,] 1650.489 1652.140 1653.125 1642.301 1636.960
## [3,] 1652.209 1649.112 1641.001 1628.410 1630.405
## [4,] 1649.993 1643.663 1630.891 1630.401 1632.360
## [5,] 1636.412 1637.506 1629.802 1631.801 1631.935

The model with minimum AIC (1630.410) is ARMA(p, q) with p = 2 and q = 3. Hence we fit an ARMA(2,3) model to the differenced series Y(t):

## 
## Call:
## arima(x = dwine, order = c(2, 0, 3))
## 
## Coefficients:
##           ar1      ar2     ma1     ma2     ma3  intercept
##       -1.0226  -0.8662  1.1775  1.1854  0.4210  -213.3700
## s.e.   0.0691   0.0717  0.1011  0.1348  0.0915    35.2152
## 
## sigma^2 estimated as 82830:  log likelihood = -808.21,  aic = 1628.41

the fitted model is given as $Y(t)=-213.37-1.0226Y(t-1)-0.8662Y(t-2)+Z(t)+1.1775Z(t-1)+1.1854Z(t-2)+0.4210Z(t-3), Z(t) \sim N(0,82830)$. next step would be residual diagnostics.

residual analysis

When a model has been fitted to a time series, it is important to check that the model really does provide an adequate description of the data. this is usually done by looking at the residuals, defined by residual = observation-fitted value.

consider the AR(1) model, $X(t)=\alpha X(t-1)+Y(t), Y(t)\sim N(0,1)$, the fitted value is given by $\hat{X}(t)=\hat{\alpha}x(t-1)$ and the residual at time t is given by $\hat{Y}(t)=x(t)-\hat{\alpha}x(t-1)$.

if we chose a good model, then we expect the residuals to be random and close to 0 since we expect a purely random process out of the true model and residuals. plot out residuals to see if it is true. for time series, plot residuals as a time series. therefore we have a two-step procedure for residual diagnostics:

plot out residuals as a time series: reveal outliers
calculate and plot the correlogram of the residuals: reveal autocorrelation and/or cyclic effects
portmanteau lack-of-fit test

## 
##  Box-Pierce test
## 
## data:  dwine.fit$resid
## X-squared = 0.00089997, df = 1, p-value = 0.9761

forecasting (univariate)

Suppose we have an observed time series, $x_1, \dots, x_N$. then a basic problem is to estimate future values such as $x_{N+h}$, where h denotes the lead time or forecasting horizon. the forecast made at time N for h steps ahead is denoted by $\hat{x}_N(h)$.

Forecasts are conditional statements about the future based on specific assumptions. It is advised to produce a set of forecasts based on different sets of assumptions so that alternative scenarios can be compared and explored.

forecasting methods can be classified into 4 categories:

subjective
univariate: this would mean, for example, that univariate forecasts of the future sales of a given product would be based entirely on past sales, and would not take account of other economic factors.
multivariate: For example, sales forecasts may depend on stocks and/or on economic indices.

extrapolation & exponential smoothing for non-seasonal data

fit a trend curve to successive values and then extrapolate, for long-term forecasting of non-seasonal data.
- commonly used for yearly total data, clearly non-seasonal
- a variety of curves may be fitted, including polynomial, exponential, logisitic, and Gompertz
- however, there is no logical basis for curve selection except by goodness-of-fit
  - multiple curves may be equally good but give very different forecasts
simple exponential smoothing (SES)

intuitively, we forecast $x_{N+1}$ by weighted sum of past obs.:

\[ \hat{x}_{N+1}=c_0x_N+\dots+c_{N-1}x_1 \]

where $\{c_i\}$ are weights and it is sensible to assign bigger weights to more recent obs. while keeping $\sum_{i=0}^{N-1}c_i=1$. an example of weights is th geometric weights:

\[ c_i=\alpha(1-\alpha)^i, \quad i=0,1,\dots, N-1, \alpha \in (0,1) \]

Holt-Winters forecasting procedures
- Holt’s exponential smoothing to deal with trend with non-seasonal data
- Holt-Winter procedure deal with both systematic trend and seasonal variation

generalized simple exponential smoothing to deal with time series with systematic trend and seasonal variation.

box-jenkins forecasting procedures

Box and Jenkins showed how the use of differencing can extend ARMA models to ARIMA to deal with non-stationary series; in addition, they also showed how to cope with seasonal variation by introducing seasonal ARIMA (SARIMA) models.

the main stages in setting up a Box-Jenkins forecasting model are:

model identification: find a member from ARIMA processes that appears suitable to the data
- In order to identify an appropriate ARIMA model, the first step in the Box–Jenkins procedure is to difference the data until they are stationary.
- until correlograms of differenced series die out fairly quickly and contains no seasonal pattern
estimation: estimate parameters of the chosen model
- non-seasonal series: fit an ARMA model to differenced series
- seasonal series use SARIMA models
diagnostics: examine the residuals to see if the model is adequate
- check for non-randomness in the residuals fro mthe fitted model
alternative models: just in case the chosen model is not adequate
- if residual diagnostics reveal inadequacy in the chosen model, try alternative models until a good one is found
forecasting: intuitive forecasting by conditional expectations
- once a satisfactory model is found, make forecasts by computing $\hat{x}_N(h)=E(X_{N+h}|X_N, X_{N-1}, \dots)$
- point predictions & prediction intervals

now use the Box-Henkins procedure to analyze the monthly unemployment rates in the U.S. from January 1948 to October 2018:

To find a time series model with good out-of-sample performance, we split the time series into a training sample and test data.

the deseasonalized series is very persistent suggesting the series is not stationary. consider a unit-root test for the deseasonalized series:

## 
##  Augmented Dickey-Fuller Test
## 
## data:  unem.desea
## Dickey-Fuller = -3.6198, Lag order = 9, p-value = 0.03078
## alternative hypothesis: stationary

p-value is smaller than 0.05, thus we reject the null and conclude the deseasonalized series is stationary. A ARIMA(p,1,q) is appropriate. next we fit a good model using AIC:

##           [,1]      [,2]      [,3]      [,4]
## [1,] -104.6192 -123.3318 -166.0700 -170.9881
## [2,] -132.6713 -161.3643 -174.0880 -172.2128
## [3,] -173.1796 -173.1427 -172.1949 -171.0363
## [4,] -174.0392 -176.2756 -175.2259 -173.7914

an ARIMA(3,1,1) model seems fit to the deseasonalized series.

## 
## Call:
## arima(x = unem.desea.series, order = c(3, 1, 1))
## 
## Coefficients:
##           ar1     ar2     ar3     ma1
##       -0.4935  0.3039  0.2109  0.6296
## s.e.   0.1517  0.0434  0.0406  0.1516
## 
## sigma^2 estimated as 0.04699:  log likelihood = 92.14,  aic = -176.28

the fitted deseasonalized series takes the form of $Y(t)=\widetilde{X}(t)-\widetilde{X}(t-1), Y(t)=-0.4935Y(t-1)+0.3039Y(t-1)+0.2109Y(t-2)+Z(t)+0.6296Z(t-1)$ where $Z(t) \sim N(0,0.04670)$.

to perform a residual diagnostic, we plot the time series of standardized $\hat{Z}(t)$, the sample autocorrelation function of it and p-values for the Ljun-Box statistic.

## 
##  Box-Pierce test
## 
## data:  fit.arima311$resid
## X-squared = 0.0072558, df = 1, p-value = 0.9321

the p-value of the test is much larger than 0.05, indicating the fitted model is appropriate.

We then use the fitted model to forecast the rates at January, February,. . . , October in 2018. We compute the $k$-step-ahead forecast and its $95\%$ prediction intervals for $\widetilde{X}(t)$ at $k=1, \dots, 10$. To obtain predicted values, seasonal effects need to be added back:

Cryer JD, Chan KS (2008) Time series analysis: With applications in r, Springer.

Chatfield C, Xing H (2019) The analysis of time series: An introduction with r, CRC Press.

Nielsen A (2019) Practical time series analysis: Prediction with statistics and machine learning, O’Reilly Media.

Theory on Time series