4. Time series data

It was no coincidence that our exemplary model was a cross-sectional one. Their creation is much simpler since cats due to their geographic dispersion don’t influence each others’ behavior. Taking a time-series into account we have to deal with phenomena like trends or seasonality. Granny’s cat for example getting older and older becomes harder to please – that’s a trend.  Those distort a model and hamper comparison of variables. We’ll dodge it (gretl will) but first we may need to comprehend a couple of things.

There are two types of time-series data. The one on the left resembles a typical cross-sectional data graph. It’s called stationary meaning its variance, covariance and expected value are equal zero the way it was in cross-sectional model. It means it’s perfectly random so all of the previous analysis tools will work properly. The trouble begins as we proceed to the right.

That’s a non stationary model. There is a trend present meaning that we can no longer use any analysis tool we’ve already learned until we do some arrangements first. The difference is that even before we check we’ve got a slight idea what the next value of a variable would be. That’s not random.

We’ve got a ‘base’ and in each following period of time we add to it ‘something’. As time goes by this ‘something’ becomes part of the ‘base’.

y0 = base
y1 = base + something 1
y2 = (base + something 1) + something 2
y3 = (base + something 1  + something 2) + something 3­  

The main idea is to separate the red from the blue. But again there are two main types of base+something models we can encounter and each has to be treated accordingly.

#1: Stochastic model

Stochastic means random. If variables are so random then what’s the whole point of discussing it in the first place? Well they are random to some extent. Again, let’s assume there’s a cat sleeping on a couch. Chances are good that if we close our eyes, count to 10 and check again it’ll still be asleep. At least we expect it to be so. We know that it’s highly impossible that within those 10 seconds cat would depart to the moon or take up cooking class. At best he could have woke up and jumped off the couch. So as you can see the randomness is quite limited. The limits are set by the last cat’s activity. If the last thing we saw cat doing before closing our eyes would be entering a space shuttle at Cape Canveral well then it would be highly probable that he would actually be departing to the moon. Therefore:

activity 1 = activity 0 + something 1

y1 = y0 + εt

what we expect to be the next activity  =  present activity

E(y1) = y0

So y is related to y0 and y2 but it doesn’t influence y3, y, y5 etc. This is called first order autoregressive process. The bigger  the number of  the following variables determined by y0 the higher the order of autoregression. In the previous model we analyzed a mutual regression of two variables. Now it is the same variable having a regression in terms of itself.

#2 Deterministic model

Opposing to the limited randomness there’s continuity. This is the cat getting older and older. We know that one day we’ll bury it in the backyard. It’s not random, that’s something to be sure of. So we look at the cat, close our eyes, count to ten, and there he is – 10 seconds older. No matter how many times we do it, the cat is always heading in the same direction. Let’s assume that the observed variable here would be the cat’s speed. Obviously as time goes by he’s getting slower, that’s our trend. There will be of course some random moments the cat will go berserk, running around the house or quite contrary laying all day by the fire. Those are what we cannot predict, the noise distorting our observations. Therefore:

speed t
constant (some average speed each non-dead cat performs)
time trend t
something t (random amok/ laziness)

Generally speaking variables are not related to one another. They’re determined by the time instead. Since the time goes always in one direction we might get the wrong impression that y1 depends on y0 and so on. That’s why stochastic models and deterministic ones are frequently confused. Unluckily we have to be sure which type of model we’re dealing with since there’s a different approach meant for each of them.

#3: Stochastic model approach

While stochastic model is not stationary, its deltas are. Those would be our ‘something’ separated from the base. Then we could substitute the variables in model (F,C,H...) with values of their deltas.

Given that the variable shows autoregression of first order then instead of F1, F2, F3, F4 ... we would have

Δ F1 = (F1 - F0)
Δ F2 = (F2 - F1)
Δ F3 = (F- F2)  ...

Those are calculated in gretl using Add menu (tool bar)

Differences of variables in gretl are marked ‘d’ so instead of F we’ll have d_F. Those d values are the ones we put in the model instead of the original ones. We can get back to the procedure of model creation that we’ve already learned.

#4: Deterministic model approach

We know that in that kind of model some part of variable F is determined by the passing time. What we would like to do is pull out only pure F values, omitting this time-contaminated part. To do so first we’ll have to figure out what is the relation between time and F thus create a  model F = time. Time variable can be added to the data using Add > Time trend.

Once we get the model we save its residuals – the differences between the estimated values and the real ones, those are our pure Fs . Select Save > Residuals.

Then once more create a model you had in your mind replacing F with its residuals’ values.

Of course you can do all of that absentmindedly clicking where told. This is what you do at the exams but that’s not the case here. Creating a model of your own requires understanding even if  done with blissful ignorance of mathematical dimension of the issue. Once you comprehend the simple you’ll be able to explore the rest.