1#
: Encountering a research problem
2#:
Defining a research problem.
The cat is in a
bad mood. The cat poses a threat to the furniture. We like the furniture. We
want the cat to be in a good mood. We have to provide the cat with mood
enhancers and prevent The Annoying. We do not know to what extent various
stimuli influence the cat. We are in need of a simulation – an econometric
model.
3#:
Choosing variables
There are two types of variables:
a. Dependent variable – the one we are about to exam (cat’s mood) tagged M
b. Independent variables (regressors)  all of the hypothetic mood enhancers and reducers:
food (given in grams) tagged F
caress (given in minutes) tagged C
human disturbance (given in annoyance units) tagged H
sleep (has slept – good; hasn’t – bad) tagged S
While the first 3 variables are quantitative ones (their value is represented in units), the variable S is a qualitative one. It means that its value can be either 1 (if the cat has slept) or 0 if he hasn’t.
In order to make it more comprehensible we put all of the above mentioned variables in an equation with pluses preceding enhancers and minuses in front of reducers. Thus:
M = F + C – H + S
To please the academics we have to put greek epsilon letter at the end of the equation. It stands for 'error term' and represents all the things that we could have possibly messed up in the formula.
M = const + F + C – H + S + epsilon
It indicates that the value of M depends on how big the rest of the variables are. This relationship is called regression – it’s the most basic concept behind the whole econometrics science.
There are two types of variables:
a. Dependent variable – the one we are about to exam (cat’s mood) tagged M
b. Independent variables (regressors)  all of the hypothetic mood enhancers and reducers:
food (given in grams) tagged F
caress (given in minutes) tagged C
human disturbance (given in annoyance units) tagged H
sleep (has slept – good; hasn’t – bad) tagged S
While the first 3 variables are quantitative ones (their value is represented in units), the variable S is a qualitative one. It means that its value can be either 1 (if the cat has slept) or 0 if he hasn’t.
In order to make it more comprehensible we put all of the above mentioned variables in an equation with pluses preceding enhancers and minuses in front of reducers. Thus:
M = F + C – H + S
To please the academics we have to put greek epsilon letter at the end of the equation. It stands for 'error term' and represents all the things that we could have possibly messed up in the formula.
Moreover at the
front of the right hand part of the equation we put constant. We do not yet
know what it’s going to be but for sure at the end of the modeling process it will
be there. Therefore the formula should look as follows:
M = const + F + C – H + S + epsilon
It indicates that the value of M depends on how big the rest of the variables are. This relationship is called regression – it’s the most basic concept behind the whole econometrics science.
4#:
Gathering data
For gathering
data we have to resort to internet sources or (what is sometimes impossible)
write down our own observations. Disregarding the chosen method there is a
fixed minimum number of observations (n)
the model will require to work properly. It depends on number of independent
variables (k) included and it’s given by the formula:
n > 31 + k
In this particular case n has to be larger that 31 + 4. 31 is a number given in textbooks so we cannot question that one. Just trust that it will work and don’t ask.
There are 3 types of data we can work at and they are as follows:
Crosssectional
data – a group of cats in the same time (The Cat, neighbor’s cat, granny’s cat...)
Panel data (timeseries and crosssectional data combined) – a group of cats over time
Once we obtain data we put it into an excel table.
Table #2: Crosssectional
data
Make sure that the number of
observations (number of rows) exceeds 35. If you don’t remember why go back to
the beginning of ‘gathering data’ section.
Having everything written down we can finally get to the model itself.
5#
Creating the model – Last Squares Method (LS)
There are two
ways to bring our model into existence.
b. Gretl (easy)
Since there are
no cats included in this part the sooner we get through a. the better. So the
procedure goes as follows:
A. Copying the data
We copy our
data table (let it be table #2) excluding the dependent variable (M). Then we
add a column representing constant. It has to be put in the front and consists
of 1s only. I hope that it goes without
saying that the remaining columns should be filled with numbers. This is just
for the simplicity’s sake that they are not.
The table forms
a matrix with number of rows equal to number of observations (n) and number of
columns equal to number of independent variables (k) + 1. Therefore what we
obtain is a nby(k + 1) matrix.
B. Transpositioning the matrix
Next step would be transpositioning the matrix meaning that we would have to reverse its dimensions. The columns will become rows and the other way round. In order to do so we copy the matrix and paste it using 'paste special' command in Excel (right mouse click).
Next step would be transpositioning the matrix meaning that we would have to reverse its dimensions. The columns will become rows and the other way round. In order to do so we copy the matrix and paste it using 'paste special' command in Excel (right mouse click).

C. Multiplying* A^{T } and A *(order
matters)
One more time excel provides a ready solution. There
is included a special formula for multiplying matrices. The only catch is that
you have to know the size of a newly created matrix before calculating it in
order to select a space for a result of multiplication. So if we multiply two matrices, r being a
number of rows, and c – a number of columns, the outcome will be:
r – by  c x r2 – by  c2 = r – by – c2
(k+1)byn x nby(k+1) = (k+1)by(k+1)
Therefore before multiplying we need to select a
rectangle high as multiplicand and wide as multiplier. When using MMULT formula
you also have to remember that when finishing the operation you press ctrl +
shift + enter instead of clicking OK. As the result we should obtain new matrix
with the element (1,1) equal to number of observations (n).
D. Reversing A^{T}A
That shouldn’t pose any difficulty either. The procedure is the same as in multiplication. We select area, find proper formula, indicate what we would like to inverse and press ctrl + shift + enter. The difference is that the size of a newly created matrix (A^{T}A)^{1} will be the same as the original one what makes it far more easier comparing to what we’ve already done.
There are only two more steps and we haven’t encountered anything tough yet. Taking into account that what we’re about to do now are two more multiplications we shouldn’t be afraid, should we?
E. Multiplying A^{T} matrix and Mood vector
One more time we will use matrix A^{T} calculated in the second step, only this time the other element of the operation would be a vector. First we will have to dig out the column representing mood values (M), then again figure out what will be the size of the calculated matrix. If defining a vector as a specific matrix with number of columns equal to 1 then:
(k+1)byn x nby1 = (k+1)by1
(k+1)byn x nby1 = (k+1)by1
So the result of the multiplication should be a vector (A^{T}M) with number of rows corresponding to the number of independent variables + 1. Once again we repeat the procedure already described in the third step. This is: select cells, enter formula, indicate elements, ctrl+shift+enter, voilà.
F. Multiplying (A^{T}A)^{1} matrix and A^{T}M vector
F. Multiplying (A^{T}A)^{1} matrix and A^{T}M vector
Another matrixvector operation. Let’s just estimate what will be the size of the calculated vector (looking at the previous step we already know that it’s going to be a vector).
(k+1)by(k+1) x (k+1)by1 = (k+1)by1
(k+1)by(k+1) x (k+1)by1 = (k+1)by1
What we got from this final equation are parameters of our model. This is what we were looking for all this time. They are given in a form of a column of numbers. How to put them into equation like the one we’ve written down at the beginning? That’s the easy part. Each consecutive cell corresponds to subsequent letters of the formula.
M = const + F + C – H + S + epsilon
M = 13 + 22,5F + 5C  14,7H + 7,25S + epsilon
This is our
model. Having it we can substitute values for independent variables in order to
calculate what the cat’s mood will be.
Therefore we can seize power over cats mind implementing him a precise
quantity of stimuli.
The good news is – we don’t have to use this toilsome
procedure, there is a program that will calculate virtually all of that for us.
What’s left is to learn how to thrust the data into that software. Then we’ll
get to know how to interpret the results and how to tell a good model from a
bad one.
.