Time Series Methods

Transcription

Time Series Methods
Time Series Methods
Sanjaya Desilva
1
Dynamic Models
In estimating time series models, sometimes we need to explicitly model the temporal
relationships between variables, i.e. does X affect Y in the same period, or with
a lag of one or more periods? do past values of Y affect the current value of Y?
Economic variables often influence each other with lags. See Gujarati, Ch. 17 for
several examples and explanations of the reason for lags and inertia.
We consider two types of dynamic models: 1) distributed lag models, and 2)
autoregressive models.
1.1
Distributed Lag Model
In the distributed lag model, the current value of Y is influenced by the current value
of X as well as its lags.
Yt = β + β0 Xt + β1 Xt−1 + β2 Xt−2 + .... + βk Xt−k + t
(1)
The estimation of this model allows us to figure out the dynamics of the effect of
X on Y. The total, or long-run effect of X on Y is,
β ∗ = β1 + ..... + βk
(2)
If we want to know whether all the lags have a collective effect on Y, we can perform a F-test with appropriate restrictions to capture the null hypothesis. Question:
Set up this F-test.
There are three problems associated with distributed lags models.
1. We need to decide how many lags to use. Because the multiplier effects generally
diminish over time, it makes theoretical sense to use a finite number of lags.
2. The lags of the same X variables are likely to be highly correlated. This may
lead to multicollinearity problems. Question: What are the consequences of
multicollinearity?
1
3. If the sample size is small, as is generally the case with time series data sets,
the use of multiple lags erodes degrees of freedom. Question: What are the
consequences of low degrees of freedom?
1.2
Autoregressive Model
The other type of dynamic model we have learned is the autoregressive model. Here,
the Y variable has persistent effects on itself. For example, current consumption
depends on past consumption, and the current stock price depends on the past stock
price. A first-order autoregressive model can be written as
Yt = β0 + β1 Xt + β2 Yt−1 + t
2
(3)
Koyck Transformation
Koyck demonstrated that, under certain conditions, the distributed lag model can be
expressed as an autoregressive model. By making this transformation, Koyck argued
that the three pitfalls associated with the distributed lag model (see above) can be
avoided. In order to make this transformation, we need assume an infinite lag model
with the following structure for the lags.
βk = β0 λk
(4)
where λ is known as the rate of decline or decay, and 0 < λ < 1
Question: Is this lag structure realistic? Why is it necessary to have 0 < λ < 1?
With this transformation, the infinite distributed lag model can be rewritten as
Yt = β + β0 [Xt + λXt−1 + λ2 Xt−2 + ....] + t
(5)
Question: In a model with infinite Koyck lags, the combined, or long-run, effect
of X on Y is
β0
.
1−λ
Why? (Hint: Use the expression for the sum of an infinite series)
2
If we multiply both sides of this equation by λ and lag both sides by one period,
we get
λYt−1 = λβ + β0 [λXt−1 + λ2 Xt−2 + λ3 Xt−3 + ....] + λt−1
(6)
By subtracting this expression from the previous one, we get
Yt − λYt−1 = β(1 − λ) + β0 Xt + t − λt−1
(7)
Rearranging terms, we get a familiar autoregressive model,
Yt = β(1 − λ) + β0 Xt + λYt−1 + t − λt−1
(8)
In the infinite distributed lag model, each lag effect was given by
βk = β0 λk
(9)
and the total long-run effect was given by
β0
1−λ
(10)
.
Therefore, if we can somehow obtain estimates for β0 and λ without having to
explicitly estimate an infinite distributed lag model, we can figure out individual as
well as combined effects of X on Y. The autoregressive model we just formulated
allows us to do just that. The coefficient of Xt is an estimate of β0 and the coefficient
of Yt−1 is an estimate of λ. Note that the estimation of the Koyck autoregressive
model does not suffer from the three pitfalls associated with running distributed lag
models because all the combined effects of infinite lags are captured by one single
autoregressive term. Therefore, this method is more efficient.
Unfortunately, we cannot obtain efficient and unbiased estimates for β0 and λ by
estimating the Koyck autoregressive function using OLS. There are two reasons for
this.
1. The error term in the Koyck AR model is serially correlated.
3
2. The autoregressive lag variable Yt−1 is endogenous, i.e. correlated with the
lagged error term et−1 .
Question: The first problem leads to inefficient estimates, whereas the second problem
leads to biased coefficient estimates. Why?
There are more sophisticated methods that can be used to estimate such a model.
For example, the second problem can be addressed if we can find an appropriate
instrumental variables for the lagged Y variable. Question: What properties should
this instrumental variable have?
These methods are beyond the scope of this course.
3
Causality
We can use the idea of lags to establish causal relationships in time series data.
3.1
Granger Test for Causality
Suppose you want to test whether X causes Y. Granger proposed the estimation of
the following pair of equations,
Yt = α1 Xt−1 + ... + αk Xt−k + β1 Yt−1 + ... + βk Yt−k + u1t
(11)
Xt = λ1 Yt−1 + ... + λk Yt−k + δ1 Xt−1 + ... + δk Xt−k + u2t
(12)
Consider Xt−3 . This variable can influence Yt in two ways. The first is a direct
casual effect from Xt−3 to Yt . The second is an indirect effects; for example, there
could be a causal effect from Xt−3 to Yt−2 , and then another causal effect from Yt−2
to Xt−1 that then gets transmitted to Yt . There are many such effects, but they all
go through a lagged Y variable. Therefore, once we control for all lags of Y, if the
lags of X still have significant effects on the current Y, we can conclude that there is
indeed a direct causal effect from X to Y. In other words, when you control for all
lags of Y, in effect controlling all effects from lagged Xs to current Y via lagged Ys,
4
we isolate the direct causal effects from the lags of X to current Y. Therefore, the if
all αi s are jointly significant, we can conclude that X (Granger) causes Y. Similarly,
if all λi s are jointly significant, we can conclude that Y (Granger) causes X.
Question: Construct a formal test to establish that X causes Y. (Hint: Use the
restrictions F-test).
3.2
Sims Test for Causality
Sims proposed a simpler test for causality using leads and lags. He suggested estimating the following equation,
Yt = α + βk Xt−k + ..... + β1 Xt−1 + β0 Xt + λ1 Xt+1 + ...... + λk Xt+k
(13)
Here, we are running the Y variable in year t on X variables with k lags (preceding
years) and k leads (succeeding years). Sims’ insight was that if X causes Y, the lags of
X should influence Y, but controlling for the lags, the leads of X should not influence
Y. Therefore, if all λs are jointly insignificant and all βs are jointly significant, we
can conclude that X causes Y.
Question: Construct the formal test for causality. Again, we need to use the F-test
for restrictions.
Note that unlike with the Granger model, we need data on lead years to carry out
this test. For example, if the Y variable is for 2000, we can use X variables from 1990
to 1999 (lags) and 2001 to 2007 (leads) to test for causality.
4
Stationarity
We begin with some definitions.
1. A stochastic process is a collection of random variables ordered over time.
For example, GDP is a random variable from which we observe realizations
every year. The time-series of such realizations is called a stochastic process.
5
2. A stationary stochastic process is a stochastic process whose mean and variance do not change over time (There are additional conditions on the covariance
that we can ignore for present purposes). For example, if the mean and variance of the distribution from which the GDP is obtained every year remains
the same, we can call the GDP process stationary.
3. A nonstationary stochastic process is a stochastic process whose mean or
variance changes over time.
Question: Do you think the GDP is in fact stationary? How about the Dow Jones
index? The GDP growth rate? The inflation rate?
4.1
Consequences of Nonstationarity
Nonstationary variables tend to show distinct patterns over time. Therefore, even if
two non-stationary variables are not causally relatied, a regression between the two
of them would show a strong correlation because both variables have these patterns.
This scenario is called spurious correlation. For example, the GDP fluctuates from
year to year, but it has a constant positive trend. So does the Dow Jones Index.
Therefore, a regression of GDP on Dow Jones will yield a significant positive effect
even though these two variables may not be causally related.
5
Random Walks
Random Walks are among the best known examples on nonstationary processes. The
simplest random walk is a random walk without drift.
5.1
Random Walk without Drift
Consider the following stochastic process.
Yt = Yt−1 + ut
6
(14)
where ut is a random shock with mean 0 and variance σ 2 . Economists who believe
in the efficient market hypothesis believe that stock prices are random walks without
drift.
Question: Why is this an AR1 process? What specific restriction must be imposed
on a standard AR1 process to obtain a random walk without drift?
Note that we can rewrite this equation as
Yt = Y0 + u1 + u2 + .... + ut
(15)
Question: Show how this can be done.
Question: Using the fact that ut (0,σ 2 ), show that
E(Yt ) = Y0
(16)
V ar(Yt ) = tσ 2
(17)
Question: Using what you just found, can you confirm that the random walk
without drift a nonstationary process?
5.2
Random Walk with Drift
A slightly different stochastic process is as follows
Yt = δ + Yt−1 + ut
(18)
where ut is the same random shock as before, but δ is a constant drift parameter.
Because of the drift, the Y variables gets shifted by a constant δ in addition to getting
shocked by ut in every period.
Question: Show that
E(Yt ) = Y0 + t.δ
(19)
V ar(Yt ) = tσ 2
(20)
Question: Confirm that the random walk with drift is in fact nonstationary?
7
5.3
Unit Root
Consider a simple AR1 process
Yt = ρYt−1 + ut
(21)
When the AR1 process has a unit root, i.e. ρ = 1, the AR1 process becomes a
random walk without drift. We know that the random walk is in fact nonstationary.
It can be shown (we don’t need to know how to show this as yet) that the AR1 process
is stationary when |ρ| < 1 Therefore, a test for a unit root, i.e.ρ = 1 has become a
common test for nonstationarity.
The unit root is a useful concept because it provides us with a solution for the
nonstationarity problem. Suppose Yt is a nonstationary variable that a unit root.
Then,
∆Yt = Yt − Yt−1 = ut
(22)
When you take the differences in Y, rather than the absolute Y, we get a stationary
variable.
Question: Why is ∆Yt stationary? What is the expected value of ∆Yt ? What is
the variance of ∆Yt ? Do either of these change over time? (Hint: the answers are 0
and σ 2 respectively).
Here is the solution summarizes: Suppose you have two nonstationary variables,
GDP and Dow Jones Index, but both of these have a unit root, then you can avoid
spurious correlation by running a regression of the changes in GDP from year to year
(∆GDPt = GDPt − GDPt−1 ) on the changes in the Dow Jones index from year to
year (∆DJt = DJt − DJt−1 ). The incorrect and correct models, respectively are
GDPt = β0 + β1 DJt + t
(23)
∆GDPt = β0 + β1 ∆DJt + t
(24)
8
5.4
Testing for Unit Roots
We have the solution of using differences if our stochastic processes do in fact have a
unit root. In order to test whether there is a unit root, Dickey and Fuller proposed
estimating the following equation.
∆Yt = δ0 + δ1 Yt−1 + t
(25)
In this equation, the null hypothesis for a unit root is H0 : δ1 = 0. (Question:
Why does δ1 = 0 imply ρ = 1?). For a random walk without drift, we can test
additionally whether δ0 = 0.
Note: In the Dickey-Fuller test, we can’t use the standard t-tests to ascertain
significance. These coefficients follow a special D-F distribution with its own set of
tables in the book. You don’t need to know the specifics of this distribution, except
to note that the standard t-test should be replaced by the D-F test.
Say we find that δ0 = 0 and δ1 = 0. Then,
∆Yt = t
(26)
Yt = Yt−1 + t
(27)
That is, Yt has a unit root and is nonstationary, but ∆Yt is stationary. We can
use the difference method.
9