Data obtained from observations collected sequentially over time are extremely common [1].
Time series analysis is the effort of extracting insightful summary and statistical information from observations arranged in chronological order [2].
several possible objectives in analysing a time series [2]:
first example: let \(e_1, \dots, e_n\) be a sequence of iid random variables with mean 0 and variance \(\sigma^2\)
\[ \begin{aligned} Y_1&=e_1 \\ Y_2&=e_1+e_2 \\ \vdots \\ Y_t &= e_1+e_2+\dots+e_t \end{aligned} \] or \(Y_t=Y_{t-1}+e_t\) with initial condition \(Y_1=e_1\). here we have \(E[Y_t]=0, var[Y_t]=t\sigma^2\) where variance increases linearly with time t. covariance for \(1 \le t \le s\), \(\gamma_{t,s}=Cov(Y_t, Y_s)=\sum_{i=1}^{s}\sum_{j=1}^{t}Cov(e_i, e_j)=t\sigma^2\); the autocorrelation for the random walk is given as \(\rho_{t,s}=\frac{\gamma_{t,s}}{\sqrt{\gamma_{t,t}\gamma_{s,s}}}=\frac{t\sigma^2}{\sqrt{t\sigma^2\cdot s\sigma^2}}=\sqrt{\frac{t}{s}}\) for \(1 \le t \le s\).
\[ \rho_{1,2}=\sqrt{\frac{1}{2}}=0.707, \rho_{8,9}=\sqrt{\frac{8}{9}}=0.943 \\ \rho_{24,25}=\sqrt{\frac{24}{25}}=0.980, \rho_{1,9}=\sqrt{\frac{1}{9}}=0.333 \] the obs. of Y at neighboring time points are more and more positively correlated as time goes by; the obs. of Y at distant time points are less and less correlated.
The simple random walk process provides a good approximation for phenomena as diverse as the movement of common stock price, and the movement of small particles in a fluid—so-called Brownian motion.
let \(\{Y_t\}\) be consisted if \(Y_t=\frac{e_t+e_t-1}{2}\) where \(e_i \overset{iid}{\sim} f(0, \sigma^2)\). \(\mu_t=E[Y_t]=\frac{E(e_t)+E(e_{t-1})}{2}=0, var(Y_t)=\frac{\sigma^2}{2}, \gamma_{t,s}=\begin{cases}0.5 \sigma^2, |t-s|=0 \\ 0.25\sigma^2, |t-s|=1 \\ 0, |t-s|>1\end{cases}\) for all t, and \(\rho_{t,s}=\begin{cases}1, |t-s|=0 \\ 0.5, |t-s|=1, \\ 0, |t-s|>1\end{cases}\).
note that \(\rho_{2,1}=\rho_{3,2}=\rho_{4,3}=\rho_{9,8}=0.5\). \(\rho_{3,1}=\rho_{4,2}=\rho_{t,t-2}\), and \(\rho_{t,t-k}\) is the same for all t.
The classic Beveridge wheat price index series consists of the average wheat price in nearly 50 places in various countries measured in successive years from 1500 to 1869.
a particular realization of a stochastic process gives observed time series. time-series analysis is essentially concerned with evaluating the properties of the underlying probabilistic model from the realized time series.
many models for stochastic processes are expressed by means of an algebraic formula relating the random variable at time t to past values of the process, together with residuals. first and second moment, autocovariance and autocorrelation function of the process are given in concepts.
stationary are a important class of stochastic process:
a discrete-time process is called purely random if it consists of a sequence of iid random variables \(Y(t)\). we further assume \(Y(t) \sim N(0, \sigma^2)\). Thus, \(\gamma(k)=Cov(Y(t), Y(t+k))=\begin{cases}\sigma^2, k = 0 \\ 0,k \ne 0\end{cases}, \rho(k)=\begin{cases} 1, k=0 \\ 0, k \ne 0\end{cases}\). purely random process is also weakly and strictly stationary as the mean and autocorrelation function do not rely on time.
suppose the sequence \(\{Y(t)\}\) is a discrete-time, purely random process and \(Y(t) \sim f(\mu, \sigma^2)\). \(X(t)\) is a random walk if \(X(t)=X(t-1)+Y(t)\). We let \(Y(0)=0\), thus we have \(X(1)=Y(1)\) and \(X(t)=\sum_{i=1}^{t}Y(i), E[X(t)]=\mathbf{t}\mu, var[X(t)]=\mathbf{t}\sigma^2\). We see that the mean and variance depend on time \(\mathbf{t}\), thus the random walks processes are not stationary.
a good example for random walks is share prices on successive days: share price on day t = share price on day (t-1) + random error.
suppose the sequence \(\{Y(t)\}\) is a discrete-time, purely random process, i.e., \(Y(t) \sim f(0, \sigma^2)\) and \(Y(t) \sim f(\mu, \sigma^2)\). \(X(t)\) is a moving average process of order \(q\) if \(X(t)=\beta_0Y(t)+\beta_1Y(t-1)+\dots+\beta_q Y(t-q)\) where \(\{\beta_i\}\) are constants. The \(\{Y\}\)s are usually scaled such that \(\beta_0=1\).
\(E[X(t)]=E[\beta_0Y(t)+\beta_1Y(t-1)+\dots+\beta_q Y(t-q)]=0, var[X(t)]=\sigma^2 \sum_{i=0}^{q}\beta_i^2\).
\[ \rho(k)= \begin{cases} 1, & k=0 \\ \sum_{i=0}^{q-k}\beta_i\beta_{i+k}/\sum_{i=0}^{q}\beta_i, & k=1,\dots, q \\ 0, & k>q \\ \rho(-k), & k < 0 \end{cases} \]
the autocovariance/autocorrelation function does not depend on time t and the mean in constant, the moving average (MA) process is second-order stationary; if \(Y(t)\)s are normally distributed, then the MA process is strictly stationary normal process.
simulate a MA(1) white noise process, \(Y(t) \sim N(0,1), X(t)=Y(t)-0.8Y(t-1)\), and MA(2) process, \(X(t)=Y(t)+0.7Y(t-1)-0.2Y(t-2)\):
no requirements on the \(\{\beta_i\}\) for a MA process to be stationary, but generally it is desirable to impose some restrictions so that the process is invertable. theimposition of the invertibility condition ensures that there is a unique ivertibke first-order MA process for a given autocorrelation function.
let the sequence \(\{Y(t)\}\) is a discrete-time, purely random process, i.e., \(Y(t) \sim f(0, \sigma^2)\) and \(Y(t) \sim f(\mu, \sigma^2)\). \(X(t)\) is a autoregressive process of order \(p\), AR(p), if \(X(t)=\alpha_1X(t-1)+\dots+\alpha_pX(t-p)+Y(t)\).
\[ \begin{aligned} X(t) &= \alpha X(t-1)+Y(t) \\ &= \alpha(\alpha X(t-2)+Y(t-1))+Y(t) \\ &= \alpha[\alpha (\alpha X(t-3)+Y(t-2))+Y(t-1)]+Y(t) = \alpha^3X(t-3)+\alpha^2Y(t-2)+\alpha Y(t-1) +Y(t) \\ &=Y(t) + \alpha Y(t-1) + \alpha^2Y(t-2)+\dots \end{aligned} \] where $-1 < < 1 $. \(E[X(t)]=0, var[X(t)]=\sigma^2 (1+\alpha^2 + \alpha^4 + \cdots)\). \(\gamma(k)=\alpha^k\sigma^2/(1-\alpha^2)\), which does not depend on time t, thus an AR(1) process is second-order stationary provided that \(|\alpha|<1\).
three AR(1) examples are given below with \(\alpha=0.8, -0.8, 0.3\):
Note how quickly the ac.f. decays when \(\alpha = 0.3\), and note how the ac.f. alternates when \(\alpha < 0\).
AR processes have been applied to many situations in which it is reasonable to assume that the present value of a time series depends linearly on the immediate past values together with a random noise.
combining AR and MA processes gives rise to amixed autoregressive/moving-average process with p AR terms and q MA terms, denoted by ARMA(p, q), and formulated by : \[ X(t)= \alpha_1X(t-1)+\dots+\alpha_pX(t-p)+Y(t)+\beta_1Y(t-1)+\dots+\beta_q Y(t-q) \]
importance o ARMA model lies in the fact that a stationary time series can be well modelled by an ARMA model with fewer parameters than a pure AR or MA process. [Principle of Parsimony]
consider ARMA(1,1) and ARMA(2,2):
non-stationary time seris are the most common in practice. To use the stationary models mentioned above, we need to first remove non-stationary sources of variation.
such as model is called an “integrated” model since the stationary model fitted to the differenced data has to be summed or integrated to provide a adequate model for the original non-stationary data, writing as \(W(t)=\Delta^dX(t)=(1-B)^dX(t)\).
the general autoregressive integrated moving average (ARIMA) process takes the form of
\[ W(t)=\alpha_1W(t-1)+\dots+\alpha_pW(t-p)+Y(t)+\beta_1Y(t-1)+\dots+\beta_qY(t-q) \] ARIMA models are commonly used to remove a trend from the data. if \(d=0\), ARIMA(p,d,q) models are an ARMA(p, q) process. An example of ARIMA(1,1,1) is given below:
Note that the ac.f. of the two integrated processes is slowly decaying because of the trend. ARIMA models can be generalized to include seasonal terms.
an variant of ARIMA models arises with the use of fractional differencing, leading to fractional integrated ARMA model (ARFIMA). ARFIMA models is exactly the same as an ARIMA(p,d,q) model, except that d is no longer restricted to being an integer.
if \(0 < d < 0.5\), a ARFIMA model is of particular interest because such a process is not only stationary, but is also a long-memory model. By long-memory, we try to express the speed at which the autocorrelation decays is very slow. For most stationary time-series models, the autocorrelation function decreases fairly fast, such as AR(1) models decreases at exponential speed. For long-memory model, however, the correlation decay to 0 very slowly, implying that obs. far apart are still correlated.
## Loading required package: fracdiff
## Warning: package 'fracdiff' was built under R version 4.3.3
## [1] "series" "ar" "ma" "d" "mu" "n.start"
A long-memory (stationary) process and a non-stationary process: distinguishing a stationary, long-memory process from a non-stationary process is difficult in practice since both both have a autocorrelation function that dies out slowly and large values at zero frequency in the spectrum.
any discrete-time stationary process can be expressed as the sum of two uncorrelated processes, one deterministic and another indeterministic:
the theoretical autocorrelation function is an important descriptive tool for the properties of a stationary stochastic process.
provided the time series is stationary, sample autocorrelation function is an intuitive estimator of the theoretical autocorrelation function.
given a stationary process \(\{x_1, \dots,x_N\}\), the sample autocovariance coefficient at lag k is given by
\[ c_k=\sum_{t=1}^{N-k}(x_t-\bar{x})(x_{t+k}-\bar{x})/N \]
which is an intuitive estimator for \(\gamma(k)\) at lag k. however, this estimator is biased \(E[c_k] \ne \gamma(k)\). but \(\lim_{N\rightarrow \infty}E[c_k]=\), thus the sample autocovariance function is asymptocally unbiased.
to estimate the autocorrelation function, we take \(r_k=c_k/c_0\) as an estimator for \(\rho(k): r_k \rightarrow \rho(k)\). but \(r_k\) is generally biased and we only consider it when sampling from a purely random process where all theoretical autocorrelation coefficients are 0 except at lag 0.
\[ E[r_k] \approx -1/N, var[r_k] \approx 1/N \] - assess randomness of a stochastic process
\(r_k \sim N(\mu, \sigma^2)\) as \(N \rightarrow \infty\) under weak conditions. plotting out \(\pm\frac{1.96}{N}\) with the autocorrelation coefficients can help determined if the process is truely random. Note that this is at \(\alpha=5\%\) level and as we plot more coefficients, say 20, we expect 1 coefficient will fall out of the range and be significant. If we observed more than 1 significant coefficients, then we say the process is not random.
the correlogram is also helpful in trying to identify appropriate models for a given time series realization. in particular, it is good for selecting the most appropriate type of auto-regressive integrated moving average (ARIMA) model.
although sample mean is used when estimating sample autocovariance and autocorrelation functions, sample mean is a potentially misleading statistic unless all systematic components have been removed. thus sample mean should only be considered for data to have come from a stationary process.
after estimating the autocorrelation function of a given time series, next step is to decide upon which model to use as a suitable model to the stochastic process.
if an AR process seem convincing, then there are two related questions:
consider a AR(p) process with a mean \(\mu\):
\[ X(t)-\mu=\alpha_1(X(t-1))+\dots+\alpha_p (X(t-p)-\mu)+Y(t) \]
given N obs. \((x_1, \dots, x_N)\), the parameters \(\mu, \alpha_1, \dots, \alpha_p\) can be estimated via least-square estimators by minimizing \(s=\sum_{t=p+1}^{N}[X(t)-\mu-\alpha_1(X(t-1)-\mu)-\dots-\alpha_p (X(t-p)-\mu)]^2\) w.r.t. \(\mu, \alpha_1, \dots, \alpha_p\). if \(Y(t)\) is normally distributed, then the least squares estimates are also maximum likelihood estimates given that the first p values in the time series being fixed.
it is generally hard to determine the order of an AR process based on solely on the sample autocorrelation functions.
example: using above procedure to fit an AR process to two simulated series AR(2) & AR(3):
\[ \begin{aligned} X(t) &=-0.4X(t-1)+0.3X(t-2)+Y(t), \quad Y(t) \sim N(0,1) \\ X(t) &=0.25X(t-1)+0.5X(t-2)-0.125X(t-3)+Y(t), \quad Y(t) \sim N(0,1) \end{aligned} \]
Note that both sample ac.f.’s decay rapidly, and the sample partial ac.f.’s cut off at lags 2 and 3, respectively. parameters of an AR process can be estimated in R
using arima()
function:
##
## Call:
## arima(x = x1, order = c(2, 0, 0))
##
## Coefficients:
## ar1 ar2 intercept
## -0.4379 0.2483 0.0404
## s.e. 0.0434 0.0434 0.0363
##
## sigma^2 estimated as 0.9347: log likelihood = -692.85, aic = 1391.7
the fitted 2nd-order process is given as: \(X(t) =0.0404-0.4379 \cdot X(t-1)+0.2483 \cdot X(t-2)+Y(t),\quad Y(t) \sim N(0,.9347)\).
suppose a MA process is thought to be a suitable model for a given time series, we then have two problems:
consider MA(1) process with mean \(\mu\): \(X(t)-\mu=Y(t)+\beta_1Y(t-1)\) where \(\mu, \beta_1\) are to be estimated and \(y(t)\) denotes a purely random process.
explicit least-squares estimates cannot be found; possible solution would be calculate sum of squares based on some starting values of the parameter recursively and determine the values of parameters that minimize \(\sum Z(t)^2\).
if a MA process is suitable to a time series, the order of the process is usually evident from the sample autocorrelation function. the theoretical autocorrelation function of a MA(q) process has a simple form in that it cuts off at lag q.
an example of using the above procedure to fit a MA process to two simulated series:
\[ \begin{aligned} X(t) & = Y(t)-0.5Y(t-1)+0.4Y(t-2), \quad Y(t)\sim N(0,1) \\ X(t) & = Y(t)+\frac{1}{3}Y(t-1)+\frac{1}{5}Y(t-2)-\frac{1}{9}Y(t-3), \quad Y(t)\sim N(0,1) \end{aligned} \]
Depending on whether one assumes the mean of the series is zero or not, parameters of an MA process with zero or nonzero mean can be estimated using:
##
## Call:
## arima(x = x1, order = c(0, 0, 2))
##
## Coefficients:
## ma1 ma2 intercept
## -0.4216 0.3597 0.0405
## s.e. 0.0428 0.0387 0.0424
##
## sigma^2 estimated as 1.021: log likelihood = -714.92, aic = 1435.83
the estimated model is \(X(t) = 0.0405 + Y(t)-0.4216Y(t-1)+0.3597Y(t-2), \quad Y(t)\sim N(0,1.021)\).
ARMA follows a similar iterative procedure for parameter estimation. residual sum of squares can be calculated at every point on a grid of parameter values, and whichever gives the minimum sum of squares is the combination we need.
consider a ARMA(1,0,1) process whose theoretical autocorrelation function decreases exponentially after lag 1:
\[ \begin{aligned} X(t)-\mu & =\alpha_1(X(t-1)-\mu)+Y(t)+\beta_1Y(t-1) \Rightarrow \\ Y(t) &= X(t)-\mu-\alpha_1(X(t-1)-\beta_1Y(t-1) \end{aligned} \] given N obs. we guess on the values for \(\hat{\mu}, \hat{\alpha_1}, \hat{\beta_1}\), and then calculate the residuals recursively:
\[ Y(t) = X(t)-\hat{\mu}-\hat{\alpha}_1[X(t-1)-\hat{\mu}]-\hat{\beta}_1Y(t-1) \] the residual sum of squares \(\sum_{t=1}^{N}Y^2(t)\) can be calculated and compared until minima is realized.
an example of ARMA(2,2) process with N=500 and \(Y(t) \sim N(0, 1)\) is given as:
\[ X(t)=0.4X(t-1)-0.3X(t-2)+Y(t)-0.6Y(t-1)+0.5Y(t-2), Y(t)=N(0,1) \]
##
## Call:
## arima(x = x5, order = c(1, 0, 1))
##
## Coefficients:
## ar1 ma1 intercept
## -0.2686 0.0139 0.0524
## s.e. 0.1291 0.1302 0.0352
##
## sigma^2 estimated as 0.9686: log likelihood = -701.51, aic = 1409.03
##
## Call:
## arima(x = x5, order = c(2, 0, 2), include.mean = F)
##
## Coefficients:
## ar1 ar2 ma1 ma2
## 0.2198 -0.3710 -0.4722 0.5362
## s.e. 0.1695 0.1503 0.1579 0.1294
##
## sigma^2 estimated as 0.9511: log likelihood = -697.03, aic = 1402.05
the fitted ARMA(2,2) model with mean 0 is: \(X(t)=0.2198X(t-1)-0.3710X(t-2)+Y(t)-0.4722Y(t-1)+0.5362Y(t-2)\), and the 95% confidence intervals cover the true values of the coefficients, suggesting the fitted model is not very different from the true model.
two standard tools used to identify an appropriate ARMA model for a given stationary time series:
the above model selection tools are subjective by matching an suitable model to the observed characteristics.
objective tools are:
an example is given below:
## Warning in stats::arima(x = x, order = order, seasonal = seasonal, xreg = xreg,
## : possible convergence problem: optim gave code = 1
## [,1] [,2] [,3] [,4]
## [1,] 1438.120 1410.682 1405.330 1402.834
## [2,] 1407.041 1409.029 1405.072 1404.575
## [3,] 1409.020 1409.781 1402.372 1403.857
## [4,] 1405.134 1406.756 1403.851 1405.852
low AIC is preferred; p=q=2 here has the lowest AIC.
in practise, it is hard to distinguish non-stationary and nearly non-stationary time series. the sample ac.f. of non-stationary and nearly non-stationary processes will both decay (very) slowly towards zero.
Long-memory stationary processes are ‘nearly non-stationary’. for example, the (stationary) AR(1) process with parameter 0.95, \(X(t)=0.95X(t-1)+Y(t)\), will look much like in short-term series the data generated by a (non-stationary) random walk, \(X(t)=X(t-1)+Y(t)\). both time plots and autocorrelation functions look similar.
model misspecification generally does not matter for short-term series; but it generates substantial differences for long-term forecast. thus, it is important to distinguish between the two types of processes.
solution:
d
is exactly equal to one ##
## Augmented Dickey-Fuller Test
##
## data: y1
## Dickey-Fuller = -3.9493, Lag order = 7, p-value = 0.01154
## alternative hypothesis: stationary
Unfortunately, tests for unit roots generally have poor power, even for moderate size samples.
many time series are clearly non-stationary. thus we difference an observed time series until it appears to be stationary, and then we fit an AR, MA, or ARMA model to the differenced series.
generally, first-order differencing of non-seasonal data is adequate, while second-order differencing is sometimes needed.for seasonal time-series, seasonal differencing may also be needed.
the possible use of differencing is the only change from fitting ARMA models:
many time series contain a seasonal periodic component, which repeats every s
obs. generalized ARIMA model (SARIMA) can deal with seasonality, denoted as SARIMA(p,d,q) x (P,D,Q)s.
The sale series demonstrates both a trend and seasonal variation. next shows how to use Box-Jenkins models to fit a time series model to the data that contains seasonal component and trend component:
Note that the differenced series does not show any seasonal effect in the time series plot; the sample ac.f. and sample partial ac.f. of Y(t) are significantly less than those of the original series X(t).
We then try to fit an ARMA model to Y(t). Since the order of the ARMA model is not given, we fit a set of ARMA models and compute their AICs to select the best order:
## [,1] [,2] [,3] [,4] [,5]
## [1,] 1648.682 1650.504 1651.521 1648.854 1635.755
## [2,] 1650.489 1652.140 1653.125 1642.301 1636.960
## [3,] 1652.209 1649.112 1641.001 1628.410 1630.405
## [4,] 1649.993 1643.663 1630.891 1630.401 1632.360
## [5,] 1636.412 1637.506 1629.802 1631.801 1631.935
The model with minimum AIC (1630.410) is ARMA(p, q) with p = 2 and q = 3. Hence we fit an ARMA(2,3) model to the differenced series Y(t):
##
## Call:
## arima(x = dwine, order = c(2, 0, 3))
##
## Coefficients:
## ar1 ar2 ma1 ma2 ma3 intercept
## -1.0226 -0.8662 1.1775 1.1854 0.4210 -213.3700
## s.e. 0.0691 0.0717 0.1011 0.1348 0.0915 35.2152
##
## sigma^2 estimated as 82830: log likelihood = -808.21, aic = 1628.41
the fitted model is given as \(Y(t)=-213.37-1.0226Y(t-1)-0.8662Y(t-2)+Z(t)+1.1775Z(t-1)+1.1854Z(t-2)+0.4210Z(t-3), Z(t) \sim N(0,82830)\). next step would be residual diagnostics.
When a model has been fitted to a time series, it is important to check that the model really does provide an adequate description of the data. this is usually done by looking at the residuals, defined by residual
= observation
-fitted value
.
consider the AR(1) model, \(X(t)=\alpha X(t-1)+Y(t), Y(t)\sim N(0,1)\), the fitted value is given by \(\hat{X}(t)=\hat{\alpha}x(t-1)\) and the residual at time t is given by \(\hat{Y}(t)=x(t)-\hat{\alpha}x(t-1)\).
if we chose a good model, then we expect the residuals to be random and close to 0 since we expect a purely random process out of the true model and residuals. plot out residuals to see if it is true. for time series, plot residuals as a time series. therefore we have a two-step procedure for residual diagnostics:
##
## Box-Pierce test
##
## data: dwine.fit$resid
## X-squared = 0.00089997, df = 1, p-value = 0.9761
Suppose we have an observed time series, \(x_1, \dots, x_N\). then a basic problem is to estimate future values such as \(x_{N+h}\), where h denotes the lead time or forecasting horizon
. the forecast made at time N for h steps ahead is denoted by \(\hat{x}_N(h)\).
Forecasts are conditional statements about the future based on specific assumptions. It is advised to produce a set of forecasts based on different sets of assumptions so that alternative scenarios can be compared and explored.
forecasting methods can be classified into 4 categories:
intuitively, we forecast \(x_{N+1}\) by weighted sum of past obs.:
\[ \hat{x}_{N+1}=c_0x_N+\dots+c_{N-1}x_1 \]
where \(\{c_i\}\) are weights and it is sensible to assign bigger weights to more recent obs. while keeping \(\sum_{i=0}^{N-1}c_i=1\). an example of weights is th geometric weights:
\[ c_i=\alpha(1-\alpha)^i, \quad i=0,1,\dots, N-1, \alpha \in (0,1) \]
generalized simple exponential smoothing to deal with time series with systematic trend and seasonal variation.
Box and Jenkins showed how the use of differencing can extend ARMA models to ARIMA to deal with non-stationary series; in addition, they also showed how to cope with seasonal variation by introducing seasonal ARIMA (SARIMA) models.
the main stages in setting up a Box-Jenkins forecasting model are:
now use the Box-Henkins procedure to analyze the monthly unemployment rates in the U.S. from January 1948 to October 2018:
To find a time series model with good out-of-sample performance, we split the time series into a training sample and test data.
the deseasonalized series is very persistent suggesting the series is not stationary. consider a unit-root test for the deseasonalized series:
##
## Augmented Dickey-Fuller Test
##
## data: unem.desea
## Dickey-Fuller = -3.6198, Lag order = 9, p-value = 0.03078
## alternative hypothesis: stationary
p-value is smaller than 0.05, thus we reject the null and conclude the deseasonalized series is stationary. A ARIMA(p,1,q) is appropriate. next we fit a good model using AIC:
## [,1] [,2] [,3] [,4]
## [1,] -104.6192 -123.3318 -166.0700 -170.9881
## [2,] -132.6713 -161.3643 -174.0880 -172.2128
## [3,] -173.1796 -173.1427 -172.1949 -171.0363
## [4,] -174.0392 -176.2756 -175.2259 -173.7914
an ARIMA(3,1,1) model seems fit to the deseasonalized series.
##
## Call:
## arima(x = unem.desea.series, order = c(3, 1, 1))
##
## Coefficients:
## ar1 ar2 ar3 ma1
## -0.4935 0.3039 0.2109 0.6296
## s.e. 0.1517 0.0434 0.0406 0.1516
##
## sigma^2 estimated as 0.04699: log likelihood = 92.14, aic = -176.28
the fitted deseasonalized series takes the form of \(Y(t)=\widetilde{X}(t)-\widetilde{X}(t-1), Y(t)=-0.4935Y(t-1)+0.3039Y(t-1)+0.2109Y(t-2)+Z(t)+0.6296Z(t-1)\) where \(Z(t) \sim N(0,0.04670)\).
to perform a residual diagnostic, we plot the time series of standardized \(\hat{Z}(t)\), the sample autocorrelation function of it and p-values for the Ljun-Box statistic.
##
## Box-Pierce test
##
## data: fit.arima311$resid
## X-squared = 0.0072558, df = 1, p-value = 0.9321
the p-value of the test is much larger than 0.05, indicating the fitted model is appropriate.
We then use the fitted model to forecast the rates at January, February,. . . , October in 2018. We compute the \(k\)-step-ahead forecast and its \(95\%\) prediction intervals for \(\widetilde{X}(t)\) at \(k=1, \dots, 10\). To obtain predicted values, seasonal effects need to be added back: