1. The idea behind an econometric model

1# : Encountering a research problem

2#: Defining a research problem.

The cat is in a bad mood. The cat poses a threat to the furniture. We like the furniture. We want the cat to be in a good mood. We have to provide the cat with mood enhancers and prevent The Annoying. We do not know to what extent various stimuli influence the cat. We are in need of a simulation – an econometric model.

3#: Choosing variables

There are two types of variables:

a. Dependent variable – the one we are about to exam (cat’s mood) tagged M

b. Independent variables (regressors) - all of the hypothetic mood enhancers and reducers:

-food (given in grams) tagged F

-caress (given in minutes) tagged C
-human disturbance (given in annoyance units) tagged H
-sleep (has slept – good; hasn’t – bad) tagged S

While the first 3 variables are quantitative ones (their value is represented in units), the variable S is a qualitative one. It means that its value can be either 1 (if the cat has slept) or 0 if he hasn’t.

In order to make it more comprehensible we put all of the above mentioned variables in an equation with pluses preceding enhancers and minuses in front of reducers. Thus:

M = F + C – H + S

To please the academics we have to put greek epsilon letter at the end of the equation. It stands for 'error term' and represents all the things that we could have possibly messed up in the formula.

Moreover at the front of the right hand part of the equation we put constant. We do not yet know what it’s going to be but for sure at the end of the modeling process it will be there. Therefore the formula should look as follows:

M = const + F + C – H + S + epsilon

It indicates that the value of M depends on how big the rest of the variables are. This relationship is called regression – it’s the most basic concept behind the whole econometrics science.

4#:  Gathering data

For gathering data we have to resort to internet sources or (what is sometimes impossible) write down our own observations. Disregarding the chosen method there is a fixed minimum number of observations  (n) the model will require to work properly. It depends on number of independent variables (k) included and it’s given by the formula:

n > 31 + k 

In this particular case n has to be larger that 31 + 4. 31 is a number given in textbooks so we cannot question that one. Just trust that it will work and don’t ask.

There are 3 types of data we can work at and they are as follows:

Time-series data - The Cat over time (hourly, daily, weekly...)

Cross-sectional data – a group of cats in the same time (The Cat, neighbor’s cat, granny’s cat...)

Panel data (time-series and cross-sectional data combined) – a group of cats over time

Once we obtain data we put it into an excel table.

Table #1: Time-series data

Table #2: Cross-sectional data

Make sure that the number of observations (number of rows) exceeds 35. If you don’t remember why go back to the beginning of ‘gathering data’ section.

Having everything written down we can finally get to the model itself. 

5# Creating the model – Last Squares Method (LS)

There are two ways to bring our model into existence.

a. Excel          (feasible)
b. Gretl             (easy)

Since there are no cats included in this part the sooner we get through a. the better. So the procedure goes as follows:

A. Copying the data

We copy our data table (let it be table #2) excluding the dependent variable (M). Then we add a column representing constant. It has to be put in the front and consists of 1s only.  I hope that it goes without saying that the remaining columns should be filled with numbers. This is just for the simplicity’s sake that they are not.

The table forms a matrix with number of rows equal to number of observations (n) and number of columns equal to number of independent variables (k) + 1. Therefore what we obtain is a n-by-(k + 1) matrix.

B. Transpositioning the matrix

Next step would be transpositioning the matrix meaning that we would have to reverse its dimensions. The columns will become rows and the other way round. In order to do so we copy the matrix and paste it using 'paste special' command in Excel (right mouse click).

Martix A     n - by - (k + 1)


Martix AT    (k + 1) - by - n

C.  Multiplying*  AT  and A                                                           *(order matters)

One more time excel provides a ready solution. There is included a special formula for multiplying matrices. The only catch is that you have to know the size of a newly created matrix before calculating it in order to select a space for a result of multiplication.  So if we multiply two matrices, r- being a number of rows, and c – a number of columns, the outcome will be:

r – by - c    x      r2 – by - c2     =    r – by – c2

(k+1)-by-n     x      n-by-(k+1)     =     (k+1)-by-(k+1)

Therefore before multiplying we need to select a rectangle high as multiplicand and wide as multiplier. When using MMULT formula you also have to remember that when finishing the operation you press ctrl + shift + enter instead of clicking OK. As the result we should obtain new matrix with the element (1,1) equal to number of observations (n).

 D. Reversing ATA

That shouldn’t pose any difficulty either. The procedure is the same as in multiplication. We select area, find proper formula, indicate what we would like to inverse and press ctrl + shift + enter. The difference is that the size of a newly created matrix  (ATA)-1 will be the same as the original one what makes it far more easier comparing to what we’ve already done.           

There are only two more steps and we haven’t encountered anything tough yet. Taking into account that what we’re about to do now are two more multiplications we shouldn’t be afraid, should we?

E. Multiplying AT matrix and Mood vector

One more time we will use matrix AT calculated in the second step, only this time the other element of the operation would be a vector.  First we will have to dig out the column representing mood values (M), then again figure out what will be the size of the calculated matrix. If defining a vector as a specific matrix with number of columns equal to 1 then:

(k+1)-by-n     x      n-by-1       =      (k+1)-by-1

So the result of the multiplication should be a vector  (ATM)  with number of rows corresponding to the number of independent variables + 1. Once again we repeat the procedure already described in the third step. This is: select cells, enter formula, indicate elements, ctrl+shift+enter, voilà.

F. Multiplying (ATA)-1 matrix and ATM vector

Another matrix-vector operation. Let’s just estimate what will be the size of the calculated vector (looking at the previous step we already know that it’s going to be a vector).

(k+1)-by-(k+1)       x      (k+1)-by-1    =    (k+1)-by-1   

   What we got from this final equation are parameters of our model. This is what we were looking for all this time. They are given in a form of a column of numbers. How to put them into equation like the one we’ve written down at the beginning? That’s the easy part. Each consecutive cell corresponds to subsequent letters of the formula. 

 M = const + F + C – H + S + epsilon

 M = 13 + 22,5F + 5C - 14,7H + 7,25S + epsilon

This is our model. Having it we can substitute values for independent variables in order to calculate what the cat’s mood will be.  Therefore we can seize power over cats mind implementing him a precise quantity of  stimuli.

The good news is – we don’t have to use this toilsome procedure, there is a program that will calculate virtually all of that for us. What’s left is to learn how to thrust the data into that software. Then we’ll get to know how to interpret the results and how to tell a good model from a bad one.