5. Time lags




Pushing a cat off a couch has an immediate negative influence on its mood. For a moment the cat’s a bit angry and as long as it doesn’t show any vindictive tendencies no further consequences follow. A single increase in human disturbance (H) leads to sharp and short-term drop in the mood (M).

Cutting off cat’s tail however may produce a pretty different outcome. While one knife slash is still perceived as a single increase in H, the cat may not recover from it for quite a time. Reflecting this situation in our model requires the application of time lag. Apart from the butchery the concept also proves to be quite useful in economics.



#1: Finite Distributed Lags

Since the stump example seems to trigger one’s imagination we’ll stick to it.  Let’s assume that we check up on cat’s mood once a week and that the unlucky day fell on the first one.  We also know that the cat is rather unforgiving so it’ll take 4-5 weeks for things to go back to normal. To make things simpler we’ll get rid of all the other variables.

M = constant + H + H­2 + H3 + H4 + H5

The instant impact on cat’s mood will equal H1.  H2, H3, H4, H5 represent the second, third, fourth and fifth week respectively. The overall effect of cutting off the tail is a sum of all the Hs in the equation. In other words the damage to the cat’s tail will affect it’s mood for 5 weeks after the massacre. We’ll be able to calculate its scope as soon as gretl unfolds all the H values.
In accordance with the previous chapter we transform Ms into its first differences. (select M, Add>  First differences...). Then, creating a model we add 4 lags of H as independent variables. There’s the ‘lags...’ button at the bottom of the ‘specify model’ window. They should appear in the ‘Regressors’ column.

                             


The generated model gives coefficients for H1 and its 4 lags (H2, H3, H4, H­5). Adding them up will tell us the value of long-run multiplier. H1 indicates the impact on the cat’s mood at the very week of the accident. According to the model pasted below the cat actually enjoyed having its body part sliced off, at least initially. That somehow explains my strong preference towards dogs.


                      

#2: Infinite Distributed Lags

No more blood, we’re back to mundane everyday life. Once a day the cat tucks into his munchies. What can’t be eaten gets immediately hidden behind the couch where the cat stores its supplies. Let’s say it’s always ¼ of the bowl. As soon as the cat senses first symptoms of the returning hunger it feasts on the half of what’s hidden behind the sofa excluding today’s crop. Supposing that M depends on F exclusively then:

M = [ 0.75 Fri F] + [ 0.25 x 0,5 Thu F] + [ 0.25 x  0,25 Wed F] + + [ 0.25  x 0,125 Tue F]... etc.

The equation could continue ad infinitum making itself unsolvable. The idea is that a single serving provides the cat with an infinite stimulus. Calculating the overall food intake and its impact on the cat’s mood requires simplification of the formula. Since this simplified one is not so simple we’ll use gretl to show how it works only this time we’ll base the model on days instead.
                       


Again we extract the M’s differences (d_M) evading its nonstationarity and we add one lag to the F. Surprising how those prior 4 weeks took more lags than the infinity. This is the promised simplification.


                           
The value of F coefficient is the immediate result of F serving on the cat’s mood.  The long-run multiplier is given by the formula:

F coefficient / ( 1 –  F_1 coefficient  )
0.331132 / (1 - 0.01196) = 0.3351

Since the difference between F coefficient and long run multiplier is so tiny we can conclude that cat derives nearly no additional profit from its stock. It’s the stupidity of either the model, the data or the cat. The problem with econometrics is that you can never be sure which one is to blame. 


4. Time series data







It was no coincidence that our exemplary model was a cross-sectional one. Their creation is much simpler since cats due to their geographic dispersion don’t influence each others’ behavior. Taking a time-series into account we have to deal with phenomena like trends or seasonality. Granny’s cat for example getting older and older becomes harder to please – that’s a trend.  Those distort a model and hamper comparison of variables. We’ll dodge it (gretl will) but first we may need to comprehend a couple of things.



There are two types of time-series data. The one on the left resembles a typical cross-sectional data graph. It’s called stationary meaning its variance, covariance and expected value are equal zero the way it was in cross-sectional model. It means it’s perfectly random so all of the previous analysis tools will work properly. The trouble begins as we proceed to the right.



 
That’s a non stationary model. There is a trend present meaning that we can no longer use any analysis tool we’ve already learned until we do some arrangements first. The difference is that even before we check we’ve got a slight idea what the next value of a variable would be. That’s not random.

We’ve got a ‘base’ and in each following period of time we add to it ‘something’. As time goes by this ‘something’ becomes part of the ‘base’.

y0 = base
y1 = base + something 1
y2 = (base + something 1) + something 2
y3 = (base + something 1  + something 2) + something 3­  

The main idea is to separate the red from the blue. But again there are two main types of base+something models we can encounter and each has to be treated accordingly.


#1: Stochastic model

Stochastic means random. If variables are so random then what’s the whole point of discussing it in the first place? Well they are random to some extent. Again, let’s assume there’s a cat sleeping on a couch. Chances are good that if we close our eyes, count to 10 and check again it’ll still be asleep. At least we expect it to be so. We know that it’s highly impossible that within those 10 seconds cat would depart to the moon or take up cooking class. At best he could have woke up and jumped off the couch. So as you can see the randomness is quite limited. The limits are set by the last cat’s activity. If the last thing we saw cat doing before closing our eyes would be entering a space shuttle at Cape Canveral well then it would be highly probable that he would actually be departing to the moon. Therefore:


activity 1 = activity 0 + something 1

y1 = y0 + εt


what we expect to be the next activity  =  present activity

E(y1) = y0

So y is related to y0 and y2 but it doesn’t influence y3, y, y5 etc. This is called first order autoregressive process. The bigger  the number of  the following variables determined by y0 the higher the order of autoregression. In the previous model we analyzed a mutual regression of two variables. Now it is the same variable having a regression in terms of itself.


#2 Deterministic model

Opposing to the limited randomness there’s continuity. This is the cat getting older and older. We know that one day we’ll bury it in the backyard. It’s not random, that’s something to be sure of. So we look at the cat, close our eyes, count to ten, and there he is – 10 seconds older. No matter how many times we do it, the cat is always heading in the same direction. Let’s assume that the observed variable here would be the cat’s speed. Obviously as time goes by he’s getting slower, that’s our trend. There will be of course some random moments the cat will go berserk, running around the house or quite contrary laying all day by the fire. Those are what we cannot predict, the noise distorting our observations. Therefore:

speed t
=
constant (some average speed each non-dead cat performs)
+
time trend t
+
something t (random amok/ laziness)


Generally speaking variables are not related to one another. They’re determined by the time instead. Since the time goes always in one direction we might get the wrong impression that y1 depends on y0 and so on. That’s why stochastic models and deterministic ones are frequently confused. Unluckily we have to be sure which type of model we’re dealing with since there’s a different approach meant for each of them.



#3: Stochastic model approach

While stochastic model is not stationary, its deltas are. Those would be our ‘something’ separated from the base. Then we could substitute the variables in model (F,C,H...) with values of their deltas.

Given that the variable shows autoregression of first order then instead of F1, F2, F3, F4 ... we would have

Δ F1 = (F1 - F0)
Δ F2 = (F2 - F1)
Δ F3 = (F- F2)  ...

Those are calculated in gretl using Add menu (tool bar)



Differences of variables in gretl are marked ‘d’ so instead of F we’ll have d_F. Those d values are the ones we put in the model instead of the original ones. We can get back to the procedure of model creation that we’ve already learned.



#4: Deterministic model approach

We know that in that kind of model some part of variable F is determined by the passing time. What we would like to do is pull out only pure F values, omitting this time-contaminated part. To do so first we’ll have to figure out what is the relation between time and F thus create a  model F = time. Time variable can be added to the data using Add > Time trend.



Once we get the model we save its residuals – the differences between the estimated values and the real ones, those are our pure Fs . Select Save > Residuals.




Then once more create a model you had in your mind replacing F with its residuals’ values.

Of course you can do all of that absentmindedly clicking where told. This is what you do at the exams but that’s not the case here. Creating a model of your own requires understanding even if  done with blissful ignorance of mathematical dimension of the issue. Once you comprehend the simple you’ll be able to explore the rest.





3. More testing



Apart from the indicators automatically spitted out by gretl we can conduct a couple of additional test. Do not fear, cause it takes no more than a couple of mouse clicks. Once again the program will calculate everything for us, we only have to know where to look for the answers. Of course the question may arise what do we need all this testing for. Well, as mere human beings, we have a great capacity to err. Each mistake distorts the model, making it grow further apart from reality. Why building a model in a first place if we don’t want it to reflect anything?


#1: Testing for collinearity

Collinearity is simply a relationship between two independent variables. Think of a cat caressed only while he’s asleep. This would cause the values for sleep and caress to behave alike. Of course some level of collinearity doesn’t do any harm to the model. In economics it’s perfectly normal that some indicators sway together in a response to markets’ mood. We just have to make sure that those similarities in reactions lay within safe limits.

Having our previous model already calculated we select: Tests > Collinearity (menu bar).



Since everything has been already explained by the program I can only repeat. Check if any variable has VIF value exceeding 10. Our results seem ok (3.959 being the highest number in the column). A piece of cake.


#2: RESET (Regression Specification Error Test)

This test checks whether the model has a correct structure. At the beginning we’ve assumed that relation between independent variables and cat’s mood are linear. This is just the simplest  possible form of relation between sets of values. As you know there are no squares nor logarithms in our formula, but they could’ve been. RESET won’t tell us what the appropriate formula should be, it would just inform us that the one we’ve chosen is wrong, and how wrong it is.

Once again select: Tests > Romney’s Reset
                  


The result of the test is a new model with two new variables yhat^2 and yhat^3. These are squares and cubes of the cat’s mood values. In a correctly constructed model those should be valid, I mean their p values of t-ratio should be less than 0.05 (α ). They are not.

Another approach would be checking the F statistics as we did it in Wald test.

Ho  : variables are not valid

H1 : the opposite

p value 0.227 > α 0.05 so we stick to Ho what only confirms our previous assumptions.


#3: Omitted variable test

This one is the most comprehensible one. We indicate a variable we don’t like and the program checks if we are better off without it. If you like to you can repeat the test for each variable, but at the higher level of initiation you would already be able to tell which one stinks.

Select: Tests > Omit variable > Wald omit test, and choose a variable you don’t like the most.
                      
   

I’ve chosen the H. One more time we are given the Wald test statistics (F).  Every time it works the same way. Two hypothesis, H0: the variable is useless, H1: it’s not.

p-value 0.457033 > α 0.05 , we throw out the H


#4: Davidson Mc-Kinnon test

Time for dessert. It’s a bit more complicated meaning it requires more clicking. We have to get back to creation of a model. Then instead of the one we already have construct two models with divergent variables sets. This is:

M = F + C

M = H + S

or any other combination you can think of. Just remember that both models have to include the same number of variables. I wonder what happens when the original model has an odd number of those.

After creating the first model we select Save > Fitted value

What we’ll get is a new variable yhat1 that has been automatically added to the list of variables in the first gretl’s window. Repeat the procedure for the second model (Save> Fitted...) and therefore you’ll obtain another variable yhat2.

                    

Another name for fitted value is predicted value what is a bit more clear. So yhat1 is a column of cats’ moods predicted by model based on F and C. The same with yhat2. Now once again we create two separate models.

1:   M = F + C + yhat2

2:   M= H + S + yhat1



In each model we have to check whether yhat variable is valid or not. Hope you remember that it’s the p-value of t-ratio that decides. In this exemplary model yhat2 has a p-value of 0.1946. It’s greater than α meaning it lacks statistical significance. It means that this little model F+C is complete without it, it works.

In the Davidson McKinnon test we can accept or reject both models. There’s no need to choose between the winner and the loser.

I guess we are done testing for now. Though it’s one of the most important part of the modeling process. After creation of a hypothetic model there’s always a list of possible defects waiting to be ticked off. Never skip it . A defective model can do more harm than good.


2. Technology serving science - introduction to econometric software



#1: Stealing gretl

Luckily it’s one of those open source boons. It can be downloaded from http://gretl.sourceforge.net/


#2:  Creating a model    
                                                                              
Once you download it you need to import your data. The data file should be saved as excel 97-2003 document.  More up to date versions are beyond gretl’s reach.     

       
                      
If done successfully you’ll be able to have a glimpse at each variable separately (left mouse double click on variable’s tag). You can check each just to make sure that everything went as planned.


After this initial familiarization with the new environment select:                      
Model > Ordinary Least Squares, then arrange variables as shown below and click OK.



                         





In less than a second, the program generates exactly the same parameters that we’ve already calculated in excel. While it’s not very motivating, the program did that faster and more accurately producing plenty of additional indicators.

 
#3: Interpreting the results

The numbers are pure work of fiction but we still can interpret them showing how it works. We’ll omit some of those more complicated indicators since at this level of knowledge they could’ve at best litter our minds.


Coefficients   (also known as regressors) 
                                                       
We already know that these are consecutive elements of the equation and we’ve also mastered how to put them in the right order. Still, we haven’t yet mentioned how exactly should those be understood. For example:

If F equals 22.15 it means that ceteris paribus[1] each additional gram of food makes cat (on average)  22.15 units happier.

H coefficient, the negative one, indicates that increasing human disturbance per 1 unit causes on average 14.7 units decrease in cats mood (ceteris paribus).

S, the only zero-one variable in the model, can be interpreted as follows: cat’s mood ceteris paribus is on average better by 7.25 units if the cat has slept (that is if S equals 1).

The last question concerning those parameters would be what does the constant mean. The idea is to put 0 in place of all the remaining variables meaning that cat didn’t get any food, caress etc. Then the default mood would be 13. The imaginary graph of the cat’s mood would thus originate from point (0,13). Sometimes the constant cannot be interpreted because variables won’t ever be equal to zero in a real world.


Standard errors

Those are estimators of what the variance of coefficients‘ distribution will be. Tells you how much will parameters of different cats sway in reference to the calculated coefficients. The bigger the standard error the weaker the model. But as long as it doesn’t exceed half of the coefficient it’s ok. The mathematical interpretation of the rule goes as follows:

(standard error/coefficient) x 100 < 50


Calculating that for variable C we’ll get:

(0,0235 / 5) x 100 <50

0,47 < 50
                           
This means that the variable’s parameter is good, too good I would say. Since numbers in the example are quite random I guess the indicators will keep reaching extremes.


R-squared (R2)

It’s a relation of parameters variance to dependent variable variance. It tells you how much the change in the cat’s mood is explained by the change in quantity of food, caress etc. In other words how well the model we created explains moods alterations.

R2 varies from 0 to 1,  0 being the worst and 1 the best possible model. There is no fixed number separating the good from the bad. In big cross-sectional data models based on international statistics R2 should be contained within 0.30 – 0.70 limit, while for those little ones concerning individual households, companies or cats the acceptable R values vary from 0.05 to 0.40. Those limits get stricter in time-series models where R2 is expected to be greater than 0.70.

Our little cats-based model with its R2 amounting to 0.1010 seems all right.


Adjusted R2


The tricky thing about the R2 indicator is that its value increases as we put more and more variables into the model and so the temptation arises to come up with as many variables as possible. The end to this sick fantasy puts adjusted R2. It calculates exactly the same thing as plain R2 does. The only difference is that this indicator punishes us for each additional variable added to the model deducting a little from the original indicator. That’s why adjusted R2 is always a bit smaller than R2. Though I’ve never seen it having a negative value...


p-value of t-ratio  (empirical significance value)

In order to interpret that one we have to turn on more abstract thinking. So there’s a hypothesis, that a variable (let’s say F) lacks empirical significance. The term empirical significance comes from statistics and tells whether something is important or not. So one more time:

hypothesis 0 : F is not important to the model
and it’s alternative hypothesis 1 : F is important to the model

There’s also given a so called significance level (α) of 0.05. If:

p value > α  then the hypothesis­ 0 is true, F is not important to the model
p value < α  then we accept the alternative hypothesis 1, F is important to the model

It all comes down to a very simple thing. You look at p-value column. Something bigger than 0.05 – bad, not important.

In our model only the constant has empirical significance (constants always have it), so as you can see that’s not a very good model.

The column t-ratio is given so that you can check for yourself which hypothesis to accept and which to reject. You can do it using statistical table for student’s t-distribution. Since we already got p-value it seems just pointless.


Wald test (F statistics)

It calculates the same thing as t-ratio significance test. The difference is that the former does it for each variable separately. Wald test checks whether the variables makes sense altogether. Instead of student’s t it is based on F distribution. One more time we are given both F value (0.927766) for our own calculations  and p-value ( 0.459739)which will make things much easier.

Ho  says that all variables taken together are unimportant to the model.
The alternative H1 states the opposite.
Significance level is always the same, α=0.05

p value 0.459739 > α 0.05

Therefore we have to accept Ho and admit that the model is totally wrong. It’s not like it comes as a surprise.



Information criterion

Shwarz, Akaike and Hannah-Quinn are all information criterions. Their values serve for comparison purposes only. If you have two models and wonder which to choose you pick the one characterized by lower values of information criterions. Those values themselves tell you nothing more.
                           
That would be all of the most basic analysis of a model. Not much but as you can see it can already tell if a model makes sense or not. This one clearly makes none.

              




[1] ceteris paribus- ‘with other things the same’, That’s just a formal requirement. Each time you interpret something you have to add ‘ceteris paribus’ somewhere in the sentence. I’m never quite sure where to put it, just make sure it’s there, anywhere.