Third Assignment: Numerical Simulation

Random Trend Model Estimation for Panel Data

Carlos Gonzales (536572)

Daniel Dimitrov (275140)

Jakob Schwerter (110583)


Abstract

In this study we look at some foundational panel data models. Using different software packages we perform numerical simulation based on a range of data generating processes. Our initial interest is first of all to find an unbiased estimator when we know that a linear trend has introduced bias in the classical panel estimation methods. We evaluate the consistency and effectiveness of several alternatives in estimating a random-trend model. We compare two main competing approaches for eliminating the trend: (i) first-differencing the data twice and (ii) within transformation after first differencing once. We evaluate the robustness of the two and find that the mixed approach (ii) may be preferable as long as the researcher can accept certain limitations. Also, we use this study to compare the effectiveness of Stata (a standard package for econometric work) against open source python libraries such as numpy and pandas. We conclude that open source libraries are not yet rich enough to provide all options that a specialized econometrics software offers, but we assume python to have a steep learning curve here in the near future. However, extensions such as ipystata could offer researchers a nice segue towards this very powerful language. To illustrate some visually appealing conclusions from our data we also use plotly.

Contents

1. Motivation

2. Simulation and Estimation with Stata

2.1. Preparing ipystata

2.2. Overview of the DGP

2.2. Simulation without Random Trend in the true DGP]

2.3. Monte Carlo study without trend in the DGP]

2.4. Monte Carlo study with the same trend for all individuals

2.5. Monte Carlo study with individual specific linear trends (Random Trend)

2.6. Monte Carlo study with nonlinear individual specific trends

3. Random Trend Simulation using Python

3.1. DGP

3.2. Data Transformation

3.3. Regression

3.4. Monte Carlo Study

3.5. Graphical Comparison of the Estimation Methods

4. Conclusion

4.1. Comparison between models

4.2. Comparison between Stata and Python


1. Motivation

We construct a simulation for a random-trend (RT) model as defined by Wooldrige 2010, section 11.7.1. The model is an extension of the first-difference (FD) model for panel-data and overcomes some of its deficiencies. Our main interest is to find an unbiased estimator after a linear random-trend has already introduced some bias in the classical panel estimation methods.

1.1. Basic Panel Data Model

The basic model to estimate a dependent variable follows a linear panel data specification. The specific model is as follows:


yit = β ⋅ xit + uit (1)

As we are working with a panel, the index i stands for an individual (person, company, country, etc.), while the index t represents time. The dependent and the independent variables are respectively indicated by yit and xit and vary both over the individuals and over the time dimension. The error term is represented by uit. This can be decomposed into three terms:


uit = αi + eit

In this context αi is an unobserved individual specific effect that stays constant for each individual and eit is an idiosyncratic error that changes both over time and over individuals. In practice we are mostly interested in estimating the effect of x on y and so we are looking for an unbiased and consistent estimation of β.

Given that both x and y are correlated with the first two parts of the error term, using just normal OLS without any data-transformation (pooled regression) will give biased results due to an endogeneity bias.

Two methods can be used to deal with the endogeneity problem induced by the constant error term: The fixed-effects (FE) and the first-differences (FD) method.

The fixed-effects (FE) method can handle situations using the so-called within transformation, subtracting the mean from each variable corresponding to each individual. This eliminates the individual-specific effect which is constant over time (here: αi) and allows for a consistent OLS estimation :


yit − yi′=β ⋅ (xit − xi)+αi − αi + eit − ei (2)

with $y'_i = \sum_{t=1}^T y_{it}$, $x'_i = \sum_{t=1}^T x_{it}$ and $e'_i = \sum_{t=1}^T e_{it}$

The first-differences manages to control for the bias by subtracting the past observation from the current one:


yit − yit − 1 = β ⋅ (xit − xit − 1)+αi − αi + (eit − eit − 1) (3)

One can clearly see that in both methods the αi cancels out of the regression, solving a possible correlation between αi with $y_{it} $ and xit. Still, a crucial requirement for retrieving reliable estimates is that the independent variable and the idiosyncratic term remain uncorrelated in expectations.

Things get complicated however, when besides the unobserved constant (αi), the variable of interest xit is correlated also with a linear trend (gi ⋅ t). Keeping the variable names as before, the error term can now be decomposed as:


uit = αi + gi ⋅ t + eit

Now, gi is a linear trend which is specific for each individual. Note that if yit is a logarithm of the original variable, gi can also be interpreted as roughly the average growth rate over a period. In that case, the equation is usually referred to a random-growth model, otherwise simply as a random-trend. Overall, this presents an additional source of heterogeneity and needs to be dealt with before employing an OLS estimation.

To solve the possible bias problem due to the RT component, the literature states that in the first step we have to calculate the first-differences in order to transform the linear trend into a constant. To illustrate this more formally, taking the first-difference in the RT set-up gives


yit − yit − 1 = β ⋅ (xit − xit − 1)+αi − αi + gi ⋅ t − gi ⋅ (t − 1)+(eit − eit − 1)


Δyit = β ⋅ Δxit + gi + Δeit

Thereafter, it is up to the researcher to continue with either the within-transformation or to first-difference again. Note that even though both are fixed-effect methods, we will be consistent with the literature to call the within-transformation the fixed-effect method. We will investigate if one of the approaches is superior to the other in means of the estimation bias of the coefficient. It would be further of interest to investigate the standard errors and model selection criteria like R2, AIC and BIC, but this will be left out for the future research.

As we saw, the linear trend has been reduced to a constant term which can now be canceled out by a second first-difference or a with-transformation. So, we have two possibilities for estimation given the RT model: - The FD Method: Taking two times the first-difference, will be named pure in the tables later on - The FD-FE Method: First the first-difference, then the fixed-effect, named mix.

First-differencing leads to the FD Method:


Δ2yit = β ⋅ Δ2xit + Δ2eit (4)
where Δ2 stands for the taking two first-differences.

The alternative is to do a FE transformation for each variable by subtracting from it the mean corresponding to each individual. This leads to the FD-FE method:


Δyit = β ⋅ Δxit + Δeit (5)

where ' denotes that the variables are demeaned.

One can clearly see that in (4) and (5) we canceled out both terms αi and gi ⋅ t. Thereby β will not be biased even though the data initially included two different sorts of bias.

On the other hand, if we had failed to take first differences in the first place, taking first the fixed-effects as in equation (3) would have given us the following:


yit − yi = β ⋅ (xit − xi)+αi − αi + gi ⋅ (t − t′) + (eit − ei)

Using a second transformation we will not be able to cancel out the time trend effect (we will confirm this in the simulation at some point). We thereby see that it is crucial to first take the first-difference and not the within-transformation.

Given that our group is new to python but experienced in Stata, we will first do the simulation via Stata by using the package ipystata. Thereby we have a known language which we can refer to as our benchmark. In a second step we will use only open software packages to replicate the results. Thereby we will comment which python code is comparable to which Stata code. Besides having the results obtained using Stata as a benchmark, we start a nice translater from Stata to Python. We will also comment on the speed of both languages as well as advantages and disadvantages of the coding part.

The assignment will continue as follows: (i) First we explain step by step our Data Generating Process (DGP). (ii) Then we run a simulation without a random trend to see if everything works fine. As a single simulation is not enough to produce definitive answers, from here on we perform a Monte Carlo study with numerous simulation draws. First we do that with the basic DGP model without a trend (iii) Next we run the same simulation using a constant linear trend and (iv) an individual specific trend in the data generating process. To see how robust the random-trend estimates are, we will further run two simulations using two non-linear trends. Finally that we will reproduce the results of the individual trend using open-source python packages only, such as numpy, pandas and random.

1.2. Setting up Python packages for this study

First we import the computational packages that will be used for this study. Note that ipystata, even though an open source package itself, relies on Stata libraries and as a result requires a Stata license and installation on your computer.

After that we install and activate a spell checker in order to keep typos under control. We use ICalico for the purpose.

import ipystata # https://github.com/TiesdeKok/ipystata - We kind of helped so that ipystata works on mac as well :)
import pandas as pd
import random


from plotly import __version__
from plotly.offline import download_plotlyjs, init_notebook_mode, iplot

print __version__ # requires version >= 1.9.0

import plotly
plotly.offline.init_notebook_mode()

from plotly.offline import plot
from plotly.graph_objs import Scatter
from plotly.graph_objs import *
import numpy as np
import plotly.plotly as py
import plotly.graph_objs as go
import cufflinks as cf
init_notebook_mode()
import plotly.tools as tls


tls.set_credentials_file(username="jaschwer", api_key="aixfdrxn9e")
#tls.set_credentials_file(username="DanielKDimitrov", api_key="21xohc8xmw")
#py.sign_in('chalogonzalez12','q6mqfcy6yp')

# Those packages where not used in the end, but we keep it to find them easier for later projects
#from scipy import stats
#import statsmodels.api as sm
#from statsmodels.sandbox.regression.predstd import wls_prediction_std

# Different plot-package
#import matplotlib.pyplot as plt
#%matplotlib inline


# pip install ipystata --upgrade --force-reinstall
# Check if a new version is online before running the code.
# Might be useful and does not take much time
1.9.7
#install spellchecker
!ipython install-nbextension https://bitbucket.org/ipre/calico/downloads/calico-spell-check-1.0.zip
#after you install it, activate it executing the next step.     
#this video provides a demo of how it works after you install it: https://www.youtube.com/watch?v=Km3AtRynWFQ
[TerminalIPythonApp] WARNING | Subcommand `ipython install-nbextension` is deprecated and will be removed in future versions.
[TerminalIPythonApp] WARNING | You likely want to use `jupyter install-nbextension`... continue in 5 sec. Press Ctrl-C to quit now.
downloading https://bitbucket.org/ipre/calico/downloads/calico-spell-check-1.0.zip to /var/folders/fq/w4v55vm52pvg18kzm9tvtddm0000gn/T/tmpVrzZGw/calico-spell-check-1.0.zip
extracting /var/folders/fq/w4v55vm52pvg18kzm9tvtddm0000gn/T/tmpVrzZGw/calico-spell-check-1.0.zip to /usr/local/share/jupyter/nbextensions
%%javascript
require(['base/js/utils'],
function(utils) {
   utils.load_extensions('calico-spell-check');
});
<IPython.core.display.Javascript object>

2. Simulation and Estimation with Stata

2.1. Preparing ipystata

Before we proceed with setting up our DGP we will first make sure that ipystata has been successfully installed and is working well within the python environment. The first two cells are to call Stata and to see if it actually works within the Python environment. In the third cell you have to provide a path to a folder on your computer so that the system can use it to store data from the simulations.

from ipystata.config import config_stata
config_stata('/Applications/Stata/StataSE.app/Contents/MacOS/StataSE') 

# Windows  --> 'C:\Program Files (x86)\Stata14\StataSE-64.exe'
# Mac OS X --> '/Applications/Stata/StataSE.app/Contents/MacOS/stataSE'
# Linux    --> '/home/user/stata14/stata-se'
%%stata 
display "Hello, I am printed by Stata" 
display "We welcome you to our assignment and we hope you enjoy it"
Hello, I am printed by Stata
We welcome you to our assignment and we hope you enjoy it
%%stata 
cd "/Users/jakobschwerter/Dropbox/github/python/Stata/Assignment"
    * cd "D:\Google Doc\_Tilburg\_Simulation\stata" 
set seed 100
/Users/jakobschwerter/Dropbox/github/python/Stata/Assignment

2.2. Overview of the DGP

We will now go through the main part of the simulation process. Minor changes or additions to the process later on will be explained with Stata's commenting functionality within the code, which is done by " * ".

First we have to call Stata using %%stata. Then by using cd we indicate where it should look for files and where it can save them.

We further set the seed to 100, so that results are easier to be reproduced.

%%stata -o simulation
drawnorm alpha_i, n(200)
(obs 200)

We create a new dataset called simulation using -o. The type is a pandas core frame Data Frame. This is necessary for ipystata on mac, because the data would not be stored otherwise.

Then we draw 200 (independent and identical distributed (iid) random numbers of a standard normal distribution with mean zero and variance 1. We name it alphai (αi). In the context of panel data, the variable αi stands for the unobserved time-invariant individual effect.

%%stata -d simulation -o simulation
expand 5
(800 observations created)

-d calls the dataset simulation , otherwise the commands afterwards would not find any variables. The command -o simulation saves the changes.

We expand the generated data by 5, i.e. simply repeating every alpha 4 additional times. This creates 800 additional observations.

As a result, the vector expands from (200 x 1) to (200 ⋅ 5 x 1). What we have done is to create a variable of 800 observations, each described by some measure that is randomly generated. The expanding here ensures that for each individual the measure stays constant over time, thus replicating time-invariant individual effects.

%%stata -d simulation -o simulation
drawnorm nu_it e_it, n(1000)

We draw two new variables (or vectors) now with 1000 entries from a multivariate normal distribution. 1000 draws are made, so that this fits with the 200 persons and the 5 time periods we created by expanding the alphas by 5. Both variables are independent from each other, so they are not correlated, and the data is also i.i.d. over time.

To verify that we did everything correctly, we have a look into summary statistics

%%stata -d simulation
sum 
    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
       index |      1000       499.5    288.8194          0        999
     alpha_i |      1000   -.0376166    1.014596  -2.885089    2.10282
       nu_it |      1000    .0101147    1.027895  -2.885089   3.081883
        e_it |      1000   -.0310172    .9936022  -2.741925   3.359478

We see the three generated variables αi,  nuit,  eit, having a mean very close to zero and a standard deviation close to 1. Further, using stata in python, and index variable is generated automatically. The index variable would be the individual identifier variable if we would run a pooled panel regression.

%%stata -d simulation -o simulation
g x_it = nu_it + alpha_i

Generates our variable of interest xit using the two variables we created in a previous step. Since αi is part of it and is time-invariant, xit will be correlated over time.

%%stata -d simulation -o simulation
g y_it=3+alpha_i+2*x_it+e_it

Generates our dependent variable yit. We use an intercept, αi, dependent variable xit and residual eit for that. The assigned regression parameter to xit is 2.

The underlying model is a panel data model, consisting of 200 individuals and 5 time periods per person. It is completely balanced. The variables yit and xit are correlated with a constant term αi. This produces autocorrelation within an individual.

Checking if the code worked:

%%stata -d simulation
pwcorr,sig
corrtex alpha_i e_it x_it y_it, sig 
             |    index  alpha_i    nu_it     e_it     x_it     y_it
-------------+------------------------------------------------------
       index |   1.0000 
             |
             |
     alpha_i |   0.0079   1.0000 
             |   0.8022
             |
       nu_it |   0.0283   0.2159   1.0000 
             |   0.3711   0.0000
             |
        e_it |  -0.0082  -0.0082  -0.0185   1.0000 
             |   0.7967   0.7949   0.5600
             |
        x_it |   0.0233   0.7764   0.7830  -0.0172   1.0000 
             |   0.4612   0.0000   0.0000   0.5880
             |
        y_it |   0.0180   0.8425   0.6525   0.2253   0.9579   1.0000 
             |   0.5701   0.0000   0.0000   0.0000   0.0000
             |
An output file must be specified

The command pwcorr shows a correlation table of the variables in the dataset (which were generated previously). The option sig prints significance level for each correlation. Results are presented above. There is a statistically significant correlation between xit and αi (77.64%). As a result, not including or solving for αi when regressing in a panel model xit on yit will produce biased estimates. The correlation of xit on yit and αi is naturally very high, since it depends on both variables in our set-up. The correlation of eit with xit and αi is very low and statistical insignificant, which is necessary and by construction. An OLS model like yit = βxit will be upward biased since the correlation of αi and xit is positive. β will estimate the effect of α and xit. If the correlation would be negative, it would have been downward biased.

To check if the DGP is correct, we do one simulation without imposing a trend into the DGP.

2.2. Simulation without Random Trend in the true DGP

Now we run the DGP defined earlier and verify how different estimation procedures are able to capture the parameters we have embedded, namely a intercept of 3 and a β of 2. We evaluate the following panel data methods

%%stata -o sim
qui set seed 123
qui drawnorm alpha_i, n(200)
qui gen i = _n 
    * included additional to generate an index for the individual level
qui expand 5
qui bys i: g t = _n 
    * included additional to generate an index for the time level
qui drawnorm nu_it e_it, n(1000)
qui g x_it = nu_it + alpha_i
qui g y_it = 3 + alpha_i + 2*x_it + e_it 
    * DGP

qui xtset i t 
    * give stata panel-information

qui gen dx_it = d.x_it 
    * generates the first-difference for x prior to the regression command
qui gen dy_it = d.y_it 
    * generates the first-difference for y prior to the regression command


qui eststo: reg y_it x_it, cluster(i) 
    * OLS-regression (biased) (eststo saves the regression results)
qui eststo: xtreg  y_it x_it, fe  
    * FE-regression (OLS, but first the within-transformation is done)
qui eststo: reg d.y_it d.x_it, 
    * FD-regression
qui eststo: xtreg d.y_it d.x_it,fe 
    * Random-Trend 1 (mix)
qui eststo: reg d2.y_it d2.x_it,  
    * Random-Trend 2 (pure)  

esttab, long compress nogaps rename(D.x_it x_it D2.x_it x_it D.dx_it x_it) drop(_cons) b(%7,3f) se(%6.4f) scalars("N N" "F F-Stat" "p p-value" "r2 r2") sfmt(%4,0f %5,0f %5,2f %6,2f) star(* 0.001) label mtitles( "Pool" "FE" "FD" "FD-FE" "FD-FD" ) title("Without a random trend component") nolabel
    * esttab produces the table
. qui expand 5
. qui drawnorm nu_it e_it, n(1000)
. qui xtset i t 
. qui gen dx_it = d.x_it 
. qui gen dy_it = d.y_it 
. . qui eststo: xtreg  y_it x_it, fe  
. qui eststo: reg d.y_it d.x_it, 
. qui eststo: xtreg d.y_it d.x_it,fe 
. qui eststo: reg d2.y_it d2.x_it,  
. esttab, long compress nogaps rename(D.x_it x_it D2.x_it x_it D.dx_it x_it) drop(_cons) b(%7,3f) se(%6.4f) scalars("N N" "F F-Stat" "p p-value" "r2 r2") sfmt(%4,0f %5,0f %5,2f %6,2f) star(* 0.001) la
> bel mtitles( "Pool" "FE" "FD" "FD-FE" "FD-FD" ) title("Without a random trend component") nolabel

Without a random trend component
-----------------------------------------------------------------------
                       (1)        (2)        (3)        (4)        (5) 
                      Pool         FE         FD      FD-FE      FD-FD 
-----------------------------------------------------------------------
x_it                 2,500*     2,052*     2,002*     1,989*     1,995*
                  (0.0312)   (0.0338)   (0.0354)   (0.0412)   (0.0422) 
-----------------------------------------------------------------------
Observations          1000       1000        800        800        600 
F-Stat                6403       3675       3192       2331       2238 
p-value               0,00       0,00       0,00       0,00       0,00 
r2                    0,89       0,82       0,80       0,80       0,79 
-----------------------------------------------------------------------
Standard errors in parentheses
* p<0.001

We can see that the results of the normal OLS are biased, i.e. the coefficient for xit is not 2 but around 2.5. For FE and the FD however the coefficient if very close to 2. Also both random-trend (RT) approaches are unbiased. mix is farthermost of the true parameter and also the standard error is the highest (together with the other RT-approach). So we can conclude that even if we do not have a random-trend component, using two transformations does not really harm the estimation.

Comparing the different RT calculations, we see that the mix seem to be more efficient, but the pure is closer to the true value. This is consistent comparing just the normal FE and FD.

2.3. Monte Carlo study without trend in the DGP

In a single simulation we need to keep in mind that the results from just one run can be misleading, as they can be driven by chance if the seed in use simply draws numbers which work better for one method than the other. Therefore, we will run a program now to do a Monte Carlo study which will take 1000 replications to see if above results are robust.

What we add in the code now is:

  1. A timer to measure the time the code runs. The necessary commands are timer on 1, starting the timer, called '1', timer off 1, stopping the started timer and timer list 1, displaying the time needed.
  2. For the simulation commands simulation, we first have to embed the code in a program. The first line capture program drop mcprog_pool drops if somehow the program is already defined. This command is useful if you run the code an additional time. program mcprog_pool starts the program and ends with end.

The DGP in each of the following simulations stays the same, namely


yit = 3 + αi + 2 ⋅ xit + eit

What we change in each of the Monte Carlo specifications is the estimation procedure by applying each of the procedures discussed so far. In the code, the following notation will be used: pool stands for the pooled regress, fe for the within-transformation, fd for the first-difference, mix for the random-trend estimation using first-difference and fixed-effects and pure the random trend with two first-differences.

The simulation results are stored in _b for the coefficient and _se for the standard errors. The simulation of the standard errors differ from the standard deviation of the estimate (β) which was presented in the table above. In the simulation only the variation in beta is used. It is not calculated by the normal variance formula which includes the variances of the error component αi und eit.

We take 1000 replications and we call the earlier defined program mcprog_pool (Monte Carlo Program - Pooled Regression).

At the end we rename the coefficient for a better understanding and drop everything else from the data frame (keep beta_pool). The outcomes from all Monte Carlo runs are compared in the end.

2.3.1. Pooled Regression

%%stata -o pool 
timer on 1
set seed 100
capture program drop mcprog_pool
program mcprog_pool
        clear
        drawnorm alpha_i, n(200)
        gen i = _n 
        expand 5
        bys i: g t = _n 
        drawnorm nu_it e_it, n(1000)
        g x_it=nu_it+alpha_i
        g y_it=3+alpha_i+2*x_it+e_it
        reg y_it x_it, 
        drop nu_it alpha_i e_it y_it x_it
end
        simulate _b _se, nodots nolegend  reps(1000): mcprog_pool
        rename _b_x_it beta_pool
        rename _se_x_it se_pool
        keep beta_pool se_pool
timer off 1
display "The time this code runs is"
timer list 1
The time this code runs is
   1:     11.40 /        1 =      11.3990

2.3.2. FE Estimation

%%stata -o fe
timer on 1
set seed 100
capture program drop mcprog_fe
        program mcprog_fe
         clear
         drawnorm alpha_i, n(200)
         gen i = _n 
         expand 5
         bys i: g t = _n 
         drawnorm nu_it e_it, n(1000)
         g x_it=nu_it+alpha_i
         g y_it=3+alpha_i+2*x_it+e_it
         xtset i t
         xtreg y_it x_it, fe 
         drop nu_it alpha_i e_it y_it x_it
end
        simulate _b _se, nodots nolegend  reps(1000): mcprog_fe
        rename _b_x_it beta_fe
        rename _se_x_it se_fe
        keep beta_fe se_fe
timer off 1
display "The time this code runs is"
timer list 1
        
The time this code runs is
   1:     60.21 /        1 =      60.2140

2.3.2. FD Estimation

%%stata -o fd
timer on 1
set seed 100
capture program drop mcprog_fd
        program mcprog_fd
        clear
        drawnorm alpha_i, n(200)
        gen i = _n 
        expand 5
        bys i: g t = _n 
        drawnorm nu_it e_it, n(1000)
        g x_it=nu_it+alpha_i
        g y_it=3+alpha_i+2*x_it+e_it
        xtset i t
        g dx_it = d.x_it
        g dy_it = d.y_it
        reg dy_it dx_it, 
        drop nu_it alpha_i e_it dy_it dx_it y_it x_it
end
        simulate _b _se, nodots nolegend  reps(1000): mcprog_fd
        rename _b_dx_it beta_fd
        rename _se_dx_it se_fd
        keep beta_fd se_fd
timer off 1
display "The time this code runs is"
timer list 1
        
        
The time this code runs is
   1:     19.94 /        1 =      19.9360
.         

2.3.3. FD-FE (mix) Estimation

%%stata -o mix
timer on 1
set seed 100
capture program drop mcprog_mix
        program mcprog_mix
         clear
         drawnorm alpha_i, n(200)
         gen i = _n 
         expand 5
         bys i: g t = _n 
         drawnorm nu_it e_it, n(1000)
         g x_it=nu_it+alpha_i
        g y_it=3+alpha_i+2*x_it+e_it
        xtset i t
        g dx_it = d.x_it
        g dy_it = d.y_it
         xtreg dy_it dx_it, fe 
         drop nu_it alpha_i e_it dy_it dx_it y_it x_it
end
        simulate _b _se, nodots nolegend  reps(1000): mcprog_mix
        rename _b_dx_it beta_mix
        rename _se_dx_it se_mix
        keep beta_mix se_mix
        
timer off 1
display "The time this code runs is"
timer list 1
. timer off 1
The time this code runs is
   1:     62.94 /        1 =      62.9380

2.3.4. FD-FD (pure) Estimation

%%stata -o pure
timer on 1
set seed 100
capture program drop mcprog_pure
        program mcprog_pure
         clear
         drawnorm alpha_i, n(200)
         gen i = _n 
         expand 5
         bys i: g t = _n 
         drawnorm nu_it e_it, n(1000)
         g x_it=nu_it+alpha_i
         g y_it=3+alpha_i+2*x_it+e_it
        xtset i t
        g d2x_it = d2.x_it
        g d2y_it = d2.y_it
         reg d2y_it d2x_it, 
         drop nu_it alpha_i e_it y_it x_it d2y_it d2x_it
end
        simulate _b _se, nodots nolegend  reps(1000): mcprog_pure
        rename _b_d2x_it beta_pure
        rename _se_d2x_it se_pure
        keep beta_pure se_pure
        
timer off 1
display "The time this code runs is"
timer list 1
. timer off 1
The time this code runs is
   1:     18.97 /        1 =      18.9680

First, before we talk about the regression results, we want to point out that the within-transformation in FE and mix is the reason the time is much higher in those to simulations. Thus, those are less efficient in the sense on how fast they can be calculated.

2.3.5. Comparison of results

############################ Regression Results without trend
# Pool Regression Results
pool_mean = pool.mean() # Storing the results produces by the pool-simulation
beta_pool = pool_mean[0] # Results of the coefficient
se_pool = pool_mean[1] # Results of the standard error
# FE Regression Results
fe_mean = fe.mean()
beta_fe = fe_mean[0]
se_fe = fe_mean[1]
# FD Regression Results
fd_mean = fd.mean()
beta_fd = fd_mean[0]
se_fd = fd_mean[1]
# Mix Regression Results
mix_mean = mix.mean()
beta_mix = mix_mean[0]
se_mix = mix_mean[1]
# Pure Regression Results
pure_mean = pure.mean()
beta_pure = pure_mean[0]
se_pure = pure_mean[1]


results_notrend_beta = np.array([beta_pool, beta_fe, beta_fd, beta_mix, beta_pure])
results_notrend_se = np.array([se_pool, se_fe, se_fd, se_mix, se_pure])
results_notrend = np.mat([results_notrend_beta, results_notrend_se]) 
results_notrend = pd.DataFrame(results_notrend)
results_notrend.index = ["Beta", "SE"]
results_notrend.columns = ["Pool", "FE", "FD", "mix", "pure"]
print results_notrend
          Pool        FE        FD       mix      pure
Beta  2.499105  2.001513  2.000731  2.000577  1.999209
SE    0.027386  0.035339  0.035368  0.040823  0.040857

We can see that the pooled regression is upward biased from the true beta value (2). This is in line with econometric theory, as not considering the nature of the panel and pooling together the unobserved individual effects produces biased results. All other methods produce unbiased results that very close to each other. The SE are smallest for the pooled regression. Then the FE and FD are very close to each other and both RT has the highest variation. For every transformation we do we have a loss in efficiency.

2.4. Monte Carlo study with the same trend for all individuals

Here we investigate what happens to our estimator as we include the same trend for all individuals. If we would just run normal regression as presented first, we could just use the same datasets, replacing just some variables. But since we have to define programs, we need to copy all the programs and change for the trend. We will leave out the pooled regression because we already know, that this one is biased just due to the constant αi.

2.4.1. FE Regression

%%stata -o fe_st
timer on 1
set seed 100
capture program drop mcprog_fe
        program mcprog_fe
        clear
        drawnorm alpha_i, n(200)
        gen i = _n 
        expand 5
        bys i: g t = _n 
        drawnorm nu_it e_it, n(1000)
        g x_it = nu_it + alpha_i + t 
            *NOW: x_it is correlated with a linear term
        g y_it = 3 + alpha_i + 2*x_it + t + e_it 
            *DGP includes now a linear trend t
        xtset i t
        xtreg y_it x_it, fe 
        drop nu_it alpha_i e_it y_it x_it
end
        simulate _b _se, nodots nolegend  reps(100): mcprog_fe
        rename _b_x_it beta_fe_st
        rename _se_x_it se_fe_st
        keep beta_fe_st se_fe_st
            *st = same trend
timer off 1
display "The time this code runs is"
timer list 1
. timer off 1
The time this code runs is
   1:      5.88 /        1 =       5.8770

2.4.1. FD Regression

%%stata -o fd_st
timer on 1
set seed 100
capture program drop mcprog_fe
        program mcprog_fe
        clear
        drawnorm alpha_i, n(200)
        gen i = _n 
        expand 5
        bys i: g t = _n 
        drawnorm nu_it e_it, n(1000)
        g x_it = nu_it + alpha_i + t 
            * NOW: x_it is correlated with a linear term
        g y_it = 3 + alpha_i + 2*x_it + t + e_it 
            * DGP includes know the linear trend
        xtset i t
        g dx_it = d.x_it
        g dy_it = d.y_it
        reg dy_it dx_it, 
        drop nu_it alpha_i e_it y_it x_it dy_it dx_it
end
        simulate _b _se, nodots nolegend  reps(100): mcprog_fe
        rename _b_dx_it beta_fd_st
        rename _se_dx_it se_fd_st
        keep beta_fd_st se_fd_st
timer off 1
display "The time this code runs is"
timer list 1
The time this code runs is
   1:      2.24 /        1 =       2.2410

2.4.2. FD-FE (mix) Estimation

%%stata -o mix_st
timer on 1
set seed 100
capture program drop mcprog_mix
        program mcprog_mix
        clear
        drawnorm alpha_i, n(200)
        gen i = _n 
        expand 5
        bys i: g t = _n 
        drawnorm nu_it e_it, n(1000)
        g x_it = nu_it + alpha_i + t 
            * NOW: x_it is correlated with a linear term
        g y_it = 3 + alpha_i + 2*x_it + t + e_it 
            * DGP includes know the linear trend
        xtset i t
        g dx_it = d.x_it
        g dy_it = d.y_it
        xtreg dy_it dx_it, fe 
        drop nu_it alpha_i e_it dy_it dx_it y_it x_it
end
        simulate _b _se, nodots nolegend  reps(100): mcprog_mix
        rename _b_dx_it beta_mix_st
        rename _se_dx_it se_mix_st
        keep beta_mix_st se_mix_st
        
timer off 1
display "The time this code runs is"
timer list 1
. timer off 1
The time this code runs is
   1:      7.50 /        1 =       7.5020

2.4.3. FD-FD (pure) Estimation

%%stata -o pure_st
timer on 1
set seed 100
capture program drop mcprog_pure
        program mcprog_pure
        clear
        drawnorm alpha_i, n(200)
        gen i = _n 
        expand 5
        bys i: g t = _n 
        drawnorm nu_it e_it, n(1000)
        g x_it = nu_it + alpha_i + t 
            * NOW: x_it is correlated with a linear term
        g y_it = 3 + alpha_i + 2*x_it + t + e_it 
            * DGP includes know the linear trend
        xtset i t
        g d2x_it = d2.x_it
        g d2y_it = d2.y_it
        reg d2y_it d2x_it, 
            * cluster(i)
        drop nu_it alpha_i e_it y_it x_it d2y_it d2x_it
end
        simulate _b _se, nodots nolegend  reps(100): mcprog_pure
        rename _b_d2x_it beta_pure_st
        rename _se_d2x_it se_pure_st
        keep beta_pure_st se_pure_st
        
timer off 1
display "The time this code runs is"
timer list 1
. timer off 1
The time this code runs is
   1:      1.99 /        1 =       1.9880

2.4.4. Comparison of results

############################ Regression Results with same trend
# FE Regression Results
fe_mean = fe_st.mean()
beta_fe = fe_mean[0]
se_fe = fe_mean[1]
# FD Regression Results
fd_mean = fd_st.mean()
beta_fd = fd_mean[0]
se_fd = fd_mean[1]
# Mix Regression Results
mix_mean = mix_st.mean()
beta_mix = mix_mean[0]
se_mix = mix_mean[1]
# Pure Regression Results
pure_mean = pure_st.mean()
beta_pure = pure_mean[0]
se_pure = pure_mean[1]


results_sametrend_beta = np.array([beta_fe, beta_fd, beta_mix, beta_pure])
results_sametrend_se = np.array([se_fe, se_fd, se_mix, se_pure])
results_sametrend = np.mat([results_sametrend_beta, results_sametrend_se]) 
results_sametrend = pd.DataFrame(results_sametrend)
results_sametrend.index = ["Beta", "SE"]
results_sametrend.columns = [ "FE", "FD", "mix", "pure"]
print results_sametrend
            FE        FD       mix      pure
Beta  2.714095  2.003978  2.004183  2.002594
SE    0.024645  0.035522  0.040986  0.041070

As expected, the simulation shows that if the trend is the same for all individuals in the data, the within-transformation cannot solve the linear trend. Thereby the coefficient is biased. Surprisingly however, the first-difference is enough to solve for the problem introduced by a correlation of the variable of interest and the trend. We will not take a deeper look here, but it might be interest to investigate how many individuals are allowed to differ so that the first-difference is not unbiased anymore. The FD is thereby more robust than the FE. It is however questionable in empirical work if the observed individuals really follow the same trend.

Further, the first-difference performs better than the mix, but worse than the pure. The pure again has a higher standard error than the mix.

2.5. Monte Carlo study with individual specific linear trends (Random Trend)

Here we will (finally) introduce the model including the individual specific linear trend. Here we expect that only the two RT-methods will solve for the bias in xit and that now both, FE and FD preform poorly.

2.5.1 FE Estimation

%%stata -o fe_it
timer on 1
set seed 100
capture program drop mcprog_fe
        program mcprog_fe
        clear
        drawnorm alpha_i, n(200)
        gen i = _n 
        expand 5
        bys i: g t = _n 
        gen trend = t*i
            * individual specific trand   
        drawnorm nu_it e_it, n(1000)
        g x_it = nu_it + alpha_i + trend 
            * NOW: x_it is correlated with a linear specific term
        g y_it = 3 + alpha_i + 2*x_it + trend + e_it 
        xtset i t
        xtreg y_it x_it, fe 
            * cluster(i)
        drop nu_it alpha_i e_it y_it x_it
end
        simulate _b _se, nodots nolegend  reps(100): mcprog_fe
        rename _b_x_it beta_fe_it
        rename _se_x_it se_fe_it
        keep beta_fe_it se_fe_it
            * it = individual trend
timer off 1
display "The time this code runs is"
timer list 1
. timer off 1
The time this code runs is
   1:      5.66 /        1 =       5.6590

2.5.2 FD Estimation

%%stata -o fd_it
timer on 1
set seed 100
capture program drop mcprog_fe
        program mcprog_fe
        clear
        drawnorm alpha_i, n(200)
        gen i = _n 
        expand 5
        bys i: g t = _n 
        gen trend = t*i 
            * individual specific trand   
        drawnorm nu_it e_it, n(1000)
        g x_it = nu_it + alpha_i + trend 
            * NOW: x_it is correlated with a linear specific term
        g y_it = 3 + alpha_i + 2*x_it + trend + e_it 
        xtset i t
        g dx_it = d.x_it
        g dy_it = d.y_it
        reg dy_it dx_it, 
        drop nu_it alpha_i e_it y_it x_it dy_it dx_it
end
        simulate _b _se, nodots nolegend  reps(100): mcprog_fe
        rename _b_dx_it beta_fd_it
        rename _se_dx_it se_fd_it
        keep beta_fd_it se_fd_it
timer off 1
display "The time this code runs is"
timer list 1
The time this code runs is
   1:      1.86 /        1 =       1.8610

2.5.3 FD-FE (mix) Estimation

%%stata -o mix_it
timer on 1
set seed 100
capture program drop mcprog_mix
        program mcprog_mix
        clear
        drawnorm alpha_i, n(200)
        gen i = _n
        expand 5
        bys i: g t = _n 
        gen trend = t*i 
            * individual specific trand   
        drawnorm nu_it e_it, n(1000)
        g x_it = nu_it + alpha_i + trend 
            * NOW: x_it is correlated with a linear specific term
        g y_it = 3 + alpha_i + 2*x_it + trend + e_it 
        xtset i t
        g dx_it = d.x_it
        g dy_it = d.y_it
        xtreg dy_it dx_it, fe 
        drop nu_it alpha_i e_it dy_it dx_it y_it x_it
end
        simulate _b _se, nodots nolegend  reps(100): mcprog_mix
        rename _b_dx_it beta_mix_it
        rename _se_dx_it se_mix_it
        keep beta_mix_it se_mix_it
        
timer off 1
display "The time this code runs is"
timer list 1
. timer off 1
The time this code runs is
   1:      6.04 /        1 =       6.0380

2.5.4 FD-FD (mix) Estimation

%%stata -o pure_it
timer on 1
set seed 100
capture program drop mcprog_pure
        program mcprog_pure
        clear
        drawnorm alpha_i, n(200)
        gen i = _n 
        expand 5
        bys i: g t = _n 
        gen trend = t*i 
            * individual specific trand   
        drawnorm nu_it e_it, n(1000)
        g x_it = nu_it + alpha_i + trend 
            * NOW: x_it is correlated with a linear specific term
        g y_it = 3 + alpha_i + 2*x_it + trend + e_it 
        xtset i t
        g d2x_it = d2.x_it
        g d2y_it = d2.y_it
        reg d2y_it d2x_it, 
        drop nu_it alpha_i e_it y_it x_it d2y_it d2x_it
end
        simulate _b _se, nodots nolegend  reps(100): mcprog_pure
        rename _b_d2x_it beta_pure_it
        rename _se_d2x_it se_pure_it
        keep beta_pure_it se_pure_it
        
timer off 1
display "The time this code runs is"
timer list 1
. timer off 1
The time this code runs is
   1:      2.39 /        1 =       2.3920

2.4.4. Comparison of results

############################ Regression Results with individual specific linear trend
# FE Regression Results
fe_mean_it = fe_it.mean()
beta_fe_it = fe_mean_it[0]
se_fe_it = fe_mean_it[1]
# FD Regression Results
fd_mean_it = fd_it.mean()
beta_fd_it = fd_mean_it[0]
se_fd_it = fd_mean_it[1]
# Mix Regression Results
mix_mean_it = mix_it.mean()
beta_mix_it = mix_mean_it[0]
se_mix_it = mix_mean_it[1]
# Pure Regression Results
pure_mean_it = pure_it.mean()
beta_pure_it = pure_mean_it[0]
se_pure_it = pure_mean_it[1]


results_inditrend_beta = np.array([beta_fe_it, beta_fd_it, beta_mix_it, beta_pure_it])
results_inditrend_se = np.array([se_fe_it, se_fd_it, se_mix_it, se_pure_it])
results_inditrend = np.mat([results_inditrend_beta, results_inditrend_se]) 
results_inditrend = pd.DataFrame(results_inditrend)
results_inditrend.index = ["Beta", "SE"]
results_inditrend.columns = [ "FE", "FD", "mix", "pure"]
print results_inditrend
            FE        FD       mix      pure
Beta  2.999957  2.999481  2.004183  2.002594
SE    0.000272  0.001220  0.040986  0.041070

The results confirm (unsurprisingly) the bad results of the FE. Now, also the FD is not capable to control for the trend within xit. Only unbiased estimations are obtained by mix and pure. The results are the same as before (remember: same seed and the only difference is canceled out). After checking that we really changed the code, it can be concluded that for the random-trend model it does not matter whether the trend is constant or individual specific.

In this easy example it further seems that the double difference (pure) is more robust, i.e. closer to the true value. Thereby the example is in favor to use the pure rather than the mix random-trend method. Also the time point, the pure is much faster. Only the standard error suggest to use mix.

2.6. Monte Carlo study with nonlinear individual specific trends

We will further check two cases in which the trend is non-linear. The motivation is to see if one of the random-trend methods performs better even though the baseline assumption of a linear trend does not hold. Therefore we will use an exponential individual specific trend and a log individual specific trend.

Here we will not include the FE estimation anymore, because it preforms equally bad or worse than the FD. The FD is included to check whether the RT are of additional help.

2.6.1.1 FD Estimation

%%stata -o fd_exp
timer on 1
set seed 100
capture program drop mcprog_fe
        program mcprog_fe
        clear
        drawnorm alpha_i, n(200)
        gen i = _n 
        expand 5
        bys i: g t = _n 
        gen trend = exp(t*i/100) 
            * exponential individual specific trand   
            * We devide by 100, because otherwise the values of the trend would have been to high at the end
            * Thereby it would have dominated the whole term, in our case resulting in no variation between the replications
        drawnorm nu_it e_it, n(1000)
        g x_it = nu_it + alpha_i + trend 
        g y_it = 3 + alpha_i + 2*x_it + trend + e_it 
        xtset i t
        g dx_it = d.x_it
        g dy_it = d.y_it
        reg dy_it dx_it, 
        drop nu_it alpha_i e_it y_it x_it dy_it dx_it
end
        simulate _b _se, nodots nolegend  reps(100): mcprog_fe
        rename _b_dx_it beta_fd_exp
        rename _se_dx_it se_fd_exp
        keep beta_fd_exp se_fd_exp
timer off 1
display "The time this code runs is"
timer list 1
The time this code runs is
   1:      2.05 /        1 =       2.0480

2.6.1.2 FD-FE (mix) Estimation

%%stata -o mix_exp
timer on 1
set seed 100
capture program drop mcprog_mix
        program mcprog_mix
        clear
        drawnorm alpha_i, n(200)
        gen i = _n 
        expand 5
        bys i: g t = _n 
        gen trend = exp(t*i/100) 
            * exponential individual specific trand   
        drawnorm nu_it e_it, n(1000)
        g x_it = nu_it + alpha_i + trend 
        g y_it = 3 + alpha_i + 2*x_it + trend + e_it 
        xtset i t
        g dx_it = d.x_it
        g dy_it = d.y_it
        xtreg dy_it dx_it, fe 
        drop nu_it alpha_i e_it dy_it dx_it y_it x_it
end
        simulate _b _se, nodots nolegend  reps(100): mcprog_mix
        rename _b_dx_it beta_mix_exp
        rename _se_dx_it se_mix_exp
        keep beta_mix_exp se_mix_exp
        
timer off 1
display "The time this code runs is"
timer list 1
. timer off 1
The time this code runs is
   1:      6.72 /        1 =       6.7180

2.6.1.3 FD-FD (pure) Estimation

%%stata -o pure_exp
timer on 1
set seed 100
capture program drop mcprog_pure
        program mcprog_pure
        clear
        drawnorm alpha_i, n(200)
        gen i = _n 
        expand 5
        bys i: g t = _n 
        gen trend = exp(t*i/100) 
            * exponential individual specific trand   
        drawnorm nu_it e_it, n(1000)
        g x_it = nu_it + alpha_i + trend 
        g y_it = 3 + alpha_i + 2*x_it + trend + e_it 
        xtset i t
        g d2x_it = d2.x_it
        g d2y_it = d2.y_it
        reg d2y_it d2x_it, 
        drop nu_it alpha_i e_it y_it x_it d2y_it d2x_it
end
        simulate _b _se, nodots nolegend  reps(100): mcprog_pure
        rename _b_d2x_it beta_pure_exp
        rename _se_d2x_it se_pure_exp
        keep beta_pure_exp se_pure_exp
            * exp for exponential
        
timer off 1
display "The time this code runs is"
timer list 1
.         The time this code runs is
   1:      2.64 /        1 =       2.6430

2.6.1.4 Comparison of results

############################ Regression Results with exponential individual specific linear trend
# FD Regression Results
fd_mean_exp = fd_exp.mean()
beta_fd_exp = fd_mean_exp[0]
se_fd_exp = fd_mean_exp[1]
# Mix Regression Results
mix_mean_exp = mix_exp.mean()
beta_mix_exp = mix_mean_exp[0]
se_mix_exp = mix_mean_exp[1]
# Pure Regression Results
pure_mean_exp = pure_exp.mean()
beta_pure_exp = pure_mean_exp[0]
se_pure_exp = pure_mean_exp[1]


results_exptrend_beta = np.array([beta_fd_exp, beta_mix_exp, beta_pure_exp])
results_exptrend_se = np.array([se_fd_exp, se_mix_exp, se_pure_exp])
results_exptrend = np.mat([results_exptrend_beta, results_exptrend_se]) 
results_exptrend = pd.DataFrame(results_exptrend)
results_exptrend.index = ["Beta", "SE"]
results_exptrend.columns = ["FD", "mix", "pure"]
print results_exptrend
            FD       mix      pure
Beta  3.000002  3.000002  2.999999
SE    0.000034  0.000045  0.000069

Here we have the case that the trend increases individual specific exponential over time, i.e. the trend is a convex function. Both RT methods are biased, and the difference between the two is at the 6. digit, so obsolete from our point of view. Surprisingly to us is, that there is no difference between the normal FD and the RT's. This means that if the trend increases exponential, the random trend method does not improve the estimation. I would have expected that the random-trend model comes a bite closer to the true, but in this example it does not. Thus, we just get higher standard errors without the trade of more robust results.

2.6.2.1 FD Estimation

%%stata -o fd_log
timer on 1
set seed 100
capture program drop mcprog_fe
        program mcprog_fe
        clear
        drawnorm alpha_i, n(200)
        gen i = _n 
        expand 5
        bys i: g t = _n 
        gen trend = log(t*i) 
            * logarithm !!! 
        drawnorm nu_it e_it, n(1000)
        g x_it = nu_it + alpha_i + trend 
        g y_it = 3 + alpha_i + 2*x_it + trend + e_it 
        xtset i t
        g dx_it = d.x_it
        g dy_it = d.y_it
        reg dy_it dx_it
        drop nu_it alpha_i e_it y_it x_it dy_it dx_it
end
        simulate _b _se, nodots nolegend  reps(100): mcprog_fe
        rename _b_dx_it beta_fd_log
        rename _se_dx_it se_fd_log
        keep beta_fd_log se_fd_log
timer off 1
display "The time this code runs is"
timer list 1
The time this code runs is
   1:      2.61 /        1 =       2.6080

2.6.2.2. FD-FE (mix) Estimation

%%stata -o mix_log
timer on 1
set seed 100
capture program drop mcprog_mix
        program mcprog_mix
        clear
        drawnorm alpha_i, n(200)
        gen i = _n 
        expand 5
        bys i: g t = _n 
        gen trend = log(t*i) 
            * logarithm!
        drawnorm nu_it e_it, n(1000)
        g x_it = nu_it + alpha_i + trend 
        g y_it = 3 + alpha_i + 2*x_it + trend + e_it 
        xtset i t
        g dx_it = d.x_it
        g dy_it = d.y_it
        xtreg dy_it dx_it, fe 
            * cluster(i)
        drop nu_it alpha_i e_it dy_it dx_it y_it x_it
end
        simulate _b _se, nodots nolegend  reps(100): mcprog_mix
        rename _b_dx_it beta_mix_log
        rename _se_dx_it se_mix_log
        keep beta_mix_log se_mix_log
        
timer off 1
display "The time this code runs is"
timer list 1
. timer off 1
The time this code runs is
   1:      7.89 /        1 =       7.8950

2.6.2.3. FD-FD (pure) Estimation

%%stata -o pure_log
timer on 1
set seed 100
capture program drop mcprog_pure
        program mcprog_pure
        clear
        drawnorm alpha_i, n(200)
        gen i = _n 
        expand 5
        bys i: g t = _n 
        gen trend = log(t*i) 
            * logarithm! 
        drawnorm nu_it e_it, n(1000)
        g x_it = nu_it + alpha_i + trend 
        g y_it = 3 + alpha_i + 2*x_it + trend + e_it 
        xtset i t
        g d2x_it = d2.x_it
        g d2y_it = d2.y_it
        reg d2y_it d2x_it, 
        drop nu_it alpha_i e_it y_it x_it d2y_it d2x_it
end
        simulate _b _se, nodots nolegend  reps(100): mcprog_pure
        rename _b_d2x_it beta_pure_log
        rename _se_d2x_it se_pure_log
        keep beta_pure_log se_pure_log
        
timer off 1
display "The time this code runs is"
timer list 1
    
. timer off 1
The time this code runs is
   1:      2.29 /        1 =       2.2950

2.6.2.4 Comparison of results

############################ Regression Results with log individual specific linear trend
# FD Regression Results
fd_mean_log = fd_log.mean()
beta_fd_log = fd_mean_log[0]
se_fd_log = fd_mean_log[1]
# Mix Regression Results
mix_mean_log = mix_log.mean()
beta_mix_log = mix_mean_log[0]
se_mix_log = mix_mean_log[1]
# Pure Regression Results
pure_mean_log = pure_log.mean()
beta_pure_log = pure_mean_log[0]
se_pure_log = pure_mean_log[1]


results_logtrend_beta = np.array([beta_fd_log, beta_mix_log, beta_pure_log])
results_logtrend_se = np.array([se_fd_log, se_mix_log, se_pure_log])
results_logtrend = np.mat([results_logtrend_beta, results_logtrend_se]) 
results_logtrend = pd.DataFrame(results_logtrend)
results_logtrend.index = ["Beta", "SE"]
results_logtrend.columns = ["FD", "mix", "pure"]
print results_logtrend
            FD       mix      pure
Beta  2.020059  2.021325  2.004355
SE    0.035496  0.040955  0.041070

For the log individual specific trend we have the case that the trend is upward sloping, but the marginal increase decreases, i.e. we have concave trends. All three methods are good in capturing the unobserved trend component. As for the constant trend, the mix performs worst, FD second and pure performs best.

Again, FD is capable to solve the endogeneity problem and there is no need for a random-trend model.

2.6.3. The within-transformation first-step

We discussed in the beginning that it does not make sense theoretically to first transform the data using the within-transformation and then to use the first-differences. For the canceling of the trend, it was crucial to take first the first-difference. We will verify here with the Monte Carlo framework that this produces biased results.

2.6.3.1 FE-FE Estimation

%%stata -o fe_fe
timer on 1
set seed 100
capture program drop mcprog_fe
        program mcprog_fe
        clear
        drawnorm alpha_i, n(200)
        gen i = _n 
        expand 5
        bys i: g t = _n 
        gen trend = t*i  
        drawnorm nu_it e_it, n(1000)
        g x_it = nu_it + alpha_i + trend 
        g y_it = 3 + alpha_i + 2*x_it + trend + e_it 
        xtset i t
        xtdata i t x_it y_it, fe clear 
            * preforms the within-transformation, so we can be sure that it is really the first thing in the regression
        xtreg y_it x_it, fe 
            *  Random-Trend (FE-FD)
        drop y_it x_it
end
        simulate _b, nodots nolegend  reps(100): mcprog_fe
        rename _b_x_it beta_fe_fe
        keep beta_fe_fe 
            * it = fixed-effects
timer off 1
display "The time this codes run is"
timer list 1
. timer off 1
The time this codes run is
   1:      7.61 /        1 =       7.6050

2.6.3.2 FE-FD Estimation

%%stata -o fe_fd
timer on 1
set seed 100
capture program drop mcprog_fe
        program mcprog_fe
        clear
        drawnorm alpha_i, n(200)
        gen i = _n 
        expand 5
        bys i: g t = _n 
        gen trend = t*i  
        drawnorm nu_it e_it, n(1000)
        g x_it = nu_it + alpha_i + trend 
        g y_it = 3 + alpha_i + 2*x_it + trend + e_it 
        xtset i t
        xtdata i t x_it y_it, fe clear 
            * preforms the within-transformation, so we can be sure that it is really the first thing in the regression
        xtset i t 
            * to sort the data
        g dx_it = d.x_it
        g dy_it = d.y_it
        reg dy_it dx_it,  
            * Random-Trend (FE-FD)
        drop y_it x_it
end
        simulate _b, nodots nolegend  reps(100): mcprog_fe
        rename _b_dx_it beta_fe_fd
        keep beta_fe_fd 
            * it = individual trend
timer off 1
display "The time this code runs is"
timer list 1
. timer off 1
The time this code runs is
   1:      3.56 /        1 =       3.5580

2.6.3.3 Comparison of results

fe_fe_mean = fe_fe.mean()
fe_fd_mean = fe_fd.mean()

results_within_beta = np.mat([fe_fe_mean, fe_fd_mean])
results_within = pd.DataFrame(results_within_beta)
results_within.index = ["FE-FE", "FE-FD"]
results_within.columns = ["Beta"]
print results_within
           Beta
FE-FE  2.999957
FE-FD  2.999481

Results are just as biased to the normal FE. Doing the second transformation was not helpful at all, i.e. the results confirm the theoretical model. We suppress the standard errors because the results are inconsistent anyway and are in line with our expectations.


3. Random Trend Simulation using Python

Since the course requires not only a strong focus on simulation within Jupyter Notebook, but is also about employing free software packages (especially python), we will now do a similar simulation using Python.

As before, we will first do an example with just one simulation to verify that the code is working properly. In a second step, we will run the same simulation with 1,000 replications.

Following the conclusions from our benchmark results with ipstata, we will only look into the case of the linear individual specific trend. If we get similar results here, results should hold for the others as well. Besides, the RT estimation was our main question of interest in the first place.

There are some packages which can be used to run regressions, but we did not find a package that explicitly runs panel-data transformations. Since we had to write the transformation on our own, we decided calculate the regression ourselves as well.

3.1. DGP

random.seed(100) # Equivalent to `set seed 100`
############################################### Somehow the seed does not work

# For the coding part, we will set some variables which was not necessary in stata:
n = 200 # in python there is no need (or not allowed) to write `generate`
T = 5
N = n * T

alpha_i = np.random.normal(0, 1, 200) # Equivalent to `drawnorm alpha_i, n(200)`
alpha_i = np.repeat(alpha_i,5) # Equivalent to `expand 5`  

i = np.arange(n) + 1 # Equivalent to `gen i = _n`  # +1 because python would start with 0
i = np.repeat(i,T) # Equivalent to `expand 5`  

t = np.arange(5) + 1 # Equivalent to bys i: g t = _n 
t = np.tile(t,200) # Equivalent to `expand 5`  

trend = t*i # <=> `gen trend = t*i`

mean = [0, 0]
cov = [[1, 0], [0, 1]]
nu_it, e_it = np.random.multivariate_normal(mean, cov, 1000).T # <=> `drawnorm nu_it e_it, n(1000)`

x_it = (nu_it + alpha_i + trend).T # <=> `g x_it = nu_it + alpha_i + trend`
# Since we will transform the data at least ones, we don't have to include an intercept in the first place.

y_it = (3 + alpha_i + 2*x_it + trend + e_it).T # <=> `g y_it = 3 + alpha_i + 2*x_it + trend + e_it `

m = np.matrix((i, t, alpha_i, trend, y_it, x_it, e_it))
mt = m.transpose()
df = pd.DataFrame(mt)
df.columns = ["Individual", "Time", "alpha_i", "trend", "y_it", "x_it", "e_it"]
print(df)
     Individual  Time   alpha_i  trend         y_it        x_it      e_it
0             1     1 -0.236971      1     4.305062    0.578612 -0.615191
1             1     2 -0.236971      2     7.340750    0.569365  1.438992
2             1     3 -0.236971      3    13.768764    3.913316  0.179103
3             1     4 -0.236971      4    13.797010    4.419268 -1.804555
4             1     5 -0.236971      5    17.562235    5.046351 -0.293496
5             2     1 -0.494967      2     9.916523    2.642953  0.125585
6             2     2 -0.494967      4    14.168131    3.572375  0.518348
7             2     3 -0.494967      6    20.508156    7.496187 -2.989251
8             2     4 -0.494967      8    27.470208    9.130571 -1.295966
9             2     5 -0.494967     10    31.740789    9.783270 -0.330784
10            3     1 -0.420040      3    10.604935    2.311382  0.402209
11            3     2 -0.420040      6    21.689798    6.722152 -0.334466
12            3     3 -0.420040      9    27.711661    7.434243  1.263214
13            3     4 -0.420040     12    38.042979   12.984805 -2.506592
14            3     5 -0.420040     15    45.487971   14.852207 -1.796402
15            4     1  0.350938      4    16.466519    4.784984 -0.454388
16            4     2  0.350938      8    25.360237    6.896944  0.215412
17            4     3  0.350938     12    42.586552   13.447426  0.340763
18            4     4  0.350938     16    50.176044   15.237954  0.349198
19            4     5  0.350938     20    63.994312   19.393273  1.856828
20            5     1 -0.173144      5    16.567723    4.496239 -0.251611
21            5     2 -0.173144     10    30.822669    8.753520  0.488772
22            5     3 -0.173144     15    42.891455   13.029056 -0.993513
23            5     4 -0.173144     20    65.526287   20.435489  1.828454
24            5     5 -0.173144     25    78.757739   24.897136  1.136612
25            6     1  1.276827      6    27.014877    8.441791 -0.145532
26            6     2  1.276827     12    42.493650   13.537954 -0.859085
27            6     3  1.276827     18    62.451930   19.349627  1.475848
28            6     4  1.276827     24    80.901660   25.774441  1.075950
29            6     5  1.276827     30    93.071628   29.792737 -0.790672
..          ...   ...       ...    ...          ...         ...       ...
970         195     1 -0.694587    195   585.933925  194.971445 -1.314377
971         195     2 -0.694587    390  1170.804315  389.184816  0.129270
972         195     3 -0.694587    585  1759.340533  586.251817 -0.468513
973         195     4 -0.694587    780  2340.151254  779.700444 -1.555046
974         195     5 -0.694587    975  2927.438893  975.976694 -1.819907
975         196     1 -0.802931    196   593.085933  196.876589  1.135686
976         196     2 -0.802931    392  1177.063389  390.190930  2.484460
977         196     3 -0.802931    588  1763.106228  586.732864 -0.556570
978         196     4 -0.802931    784  2354.379797  784.130407 -0.078086
979         196     5 -0.802931    980  2942.011309  979.379733  1.054774
980         197     1 -0.604298    197   591.811666  196.568250 -0.720537
981         197     2 -0.604298    394  1181.937125  392.953902 -0.366381
982         197     3 -0.604298    591  1773.717583  589.431342  1.459195
983         197     4 -0.604298    788  2362.120365  786.510192 -1.295721
984         197     5 -0.604298    985  2951.836160  981.937982  0.564494
985         198     1 -0.003049    198   597.940201  198.348610  0.246029
986         198     2 -0.003049    396  1191.242052  396.410192 -0.575283
987         198     3 -0.003049    594  1790.000596  596.138268  0.727110
988         198     4 -0.003049    792  2377.468602  791.193571  0.084509
989         198     5 -0.003049    990  2974.113773  990.780222 -0.443622
990         199     1  0.314626    199   606.077908  201.964604 -0.165925
991         199     2  0.314626    398  1195.762301  397.757528 -1.067381
992         199     3  0.314626    597  1794.006256  597.115677 -0.539724
993         199     4  0.314626    796  2388.700678  795.783132 -2.180212
994         199     5  0.314626    995  2986.262423  994.869034 -1.790271
995         200     1  0.404043    200   602.790869  200.157732 -0.928638
996         200     2  0.404043    400  1207.286313  401.180779  1.520712
997         200     3  0.404043    600  1807.394473  601.812310  0.365811
998         200     4  0.404043    800  2406.242163  801.937966 -1.037813
999         200     5  0.404043   1000  3003.703726  999.611251  1.077180

[1000 rows x 7 columns]

The data generating process is here now down completely using open source packages (numpy). We print the data frame to have a quick overview whether the the DGP was correct or not. Judging from the output, it seems that so far everything is fine.

The arising problem now however is, that our Internet research did not provide us with commands which can analyze panel data as easily as Stata can do. I.e. there is no command which is directly comparable to xtset i t and xtreg y_it x_it, fe. We therefor have to construct the transformations ourselves.

3.2. Data Transformation


# First and double Difference for x:
x_reshape = np.reshape(np.ravel(x_it), (200, 5)).T # reshape x-variable to wide form for the transformation
    # To verify the reshape command, comment out the next to lines. We need 200 columsn with each 5 rows for 200 individuals haven 5 time periods
    #x_df = pd.DataFrame(x_reshape)
    #print x_df.head(n=5) # One can see that we have now 200 columns with each 5 time periods
dx = np.zeros(x_reshape.shape) #Constructing a zero variable for the first-difference command which is following (this is necessary)
dx[1:] = x_reshape[1:] - x_reshape[:-1] # First-Difference
dx = np.delete(dx,(0), axis=0) # Dropping the first period which does not contain any information anymore.
    # Every dirst-difference means that we will lose one time period.
    #dx = numpy.delete(dx,(0), axis=1)
d2x = np.zeros(dx.shape) #Constructing a zero variable for the first-difference command which is following (this is necessary)
d2x[1:] = dx[1:] - dx[:-1] # Second-Difference
d2x = np.delete(d2x,(0), axis=0)
    #For verification command out the next lines:
    #dx_df = pd.DataFrame(dx)
    #print dx_df.head(n=5) # The first entry is now zero for all zero because we can't take the first-difference from the starting value. Comparing dx_df and x_df one can clearly see that the first-differnce worked well.
    # Now we have the reshape the data again in long form for the regression
dx_reshape = dx # result is stored for the within-transformation    
dx = np.reshape(dx.T, N-n) # replacing dx with long form (to be used in a regression)
# We have to substract n from N, because we delete one time period
d2x = np.reshape(d2x.T, N-2*n)
# Same here, just that now we deleted two time periods.
    #dx[:11] #to check if transition worked

# Within-transformation for x
dx_mean = dx_reshape.mean(axis=0) # obtaining the mean for each individual after the first-difference is done
dx_mean_i = np.repeat(dx_mean,T-1) # for the calculation, construct a mean-matrix with same shape as the x-variable
dx_mean_i_reshape = np.reshape(dx_mean_i.T, N-n) # Reshaping the data into long-form
dwx = dx - dx_mean_i_reshape # Substracting the mean for the within-fransformation:
# dwx is now constructed for the Random-Trend 2, mix    
    
################ Some transition for the y-variable
y_reshape = np.reshape(np.ravel(y_it), (200, 5)).T # reshape x-variable to wide form for the transformation
dy = np.zeros(y_reshape.shape) #Constructing a zero variable for the first-difference command which is following (this is necessary)
dy[1:] = y_reshape[1:] - y_reshape[:-1] # First-Difference
dy = np.delete(dy,(0), axis=0) # Dropping the first period which does not contain any information nomore.

d2y = np.zeros(dy.shape) #Constructing a zero variable for the first-difference command which is following (this is necessary)
d2y[1:] = dy[1:] - dy[:-1] # Second-Difference
d2y = np.delete(d2y,(0), axis=0)
dy_reshape = dy # result is stored for the within-transformation
dy = np.reshape(dy.T, N-n)
d2y = np.reshape(d2y.T, N-2*n)

dy_mean = dy_reshape.mean(axis=0) 
dy_mean_i = np.repeat(dy_mean,T-1) 
dy_mean_i_reshape = np.reshape(dy_mean_i.T, N-n) 
dwy = dy - dy_mean_i_reshape 
# dwy is now constructed for the Random-Trend 2, mix

Already noting this transformation part, it does seem very reasonable to rely on Stata if one wants to analyze panel data. Surely, this code is not as efficient as possible since we just started coding, and we are certain that the code can be improved, but Stata seems so far still more convenient to us.

To have a look into our data transformation, we will show some contour histograms to show how the untransformed and transformed variables are distributed.

Relationship between untransformed yit and xit

iplot([Histogram2dContour(x=x_it, y=y_it, contours=Contours(coloring='heatmap'),showlegend = (True)),
       Scatter(x=x_it, y=y_it, mode='markers', marker=Marker(color='white', size=3, opacity=0.3))], show_link=False)

The Contour Histogram shows the observations are distributed. A higher contour line is denoted by the change from the color blue to the color brown as shown in the legend on the right.

Two things can be clearly seen in this graph: First that most of the observations are in the lower left hand corner where the color goes from blue to brown and from there it increase up to the top right hand corner. Secondly the trend included in the DGP is clearly seen by the almost straight line off the white dots in the graph (the actual points from the x- and y-coordinates).

Relationship between first-differenced Δyit and Δxit

iplot([Histogram2dContour(x=dx, y=dy, contours=Contours(coloring='heatmap')),
       Scatter(x=dx, y=dy, mode='markers', marker=Marker(color='white', size=3, opacity=0.3))], show_link=False)

For the first-differenced data we see that (i) we have now four regions where a lot of data points are located and (ii) that the data points in general are much closer to each other than before. Note that the axis decrease from (3000,1000) down to (600,250).

We thereby clearly see how the first-difference already changed the shape of the data

Relationship between two first-differenced Δ2yit and Δ2xit (i.e. FD-FD)

iplot([Histogram2dContour(x=d2x, y=d2y, contours=Contours(coloring='heatmap')),
       Scatter(x=d2x, y=d2y, mode='markers', marker=Marker(color='white', size=3, opacity=0.3))], show_link=False)

The variation in the data (unsurprisingly) decreased and the data is much more centered, having almost just one bigger region is which most of the data is located. We can further see that the observations are not ordered within a straight line anymore.

Relationship between first-differenced and de-meand Δyit and Δxit (i.e. FD-FE)

iplot([Histogram2dContour(x=dwx, y=dwy, contours=Contours(coloring='heatmap')),
       Scatter(x=dwx, y=dwy, mode='markers', marker=Marker(color='white', size=3, opacity=0.3))], show_link=False)

The last contour histogram looks of the mix-approach looks very similar to the pure-approach. The contour lines however look a bit more homogeneous or less curvy. The data points are in general a bit closer to each other given that the axis are of smaller scale. At last, the data cloud seems a bit fatter and less stretched.

Concluding, we can see clear differences introduced due to the different transformation technics which should be seen in the regression results later on.

Now we will look at the same picture on a more granular level. We will pick only 4 individuals out of the simulation and plot the level and the two different random-trend transformations of the data.

Level:

v = 50 # Using the individual number defined in v and the next 3 ones.
x_df_s = pd.DataFrame(x_reshape) # Putting the data in a panda form makes it easier to take some individuals
y_df_s = pd.DataFrame(y_reshape)
# Create traces

trace1 = go.Scatter(
    x = x_df_s[v].values,
    y = y_df_s[v].values,
    mode = 'lines+markers',
    name = 'lines+markers'
)
trace2 = go.Scatter(
    x = x_df_s[v+1].values,
    y = y_df_s[v+1].values,
    mode = 'lines+markers',
    name = 'lines+markers'
)
trace3 = go.Scatter(
    x = x_df_s[v+2].values,
    y = y_df_s[v+2].values,
    mode = 'lines+markers',
    name = 'lines+markers'
)
trace4 = go.Scatter(
    x = x_df_s[v+3].values,
    y = y_df_s[v+3].values,
    mode = 'lines+markers',
    name = 'lines+markers'
)
trace0 = go.Scatter(
    x = np.concatenate((x_df_s[v], x_df_s[v+1], x_df_s[v+2], x_df_s[v+3]), axis=0),
    y = np.concatenate((y_df_s[v], y_df_s[v+1], y_df_s[v+2], y_df_s[v+3]), axis=0),
    mode = 'markers',
    name = 'markers'
)
data = [ trace1, trace2, trace3, trace4, trace0]

# Plot and embed in ipython notebook!
py.iplot(data, filename='6')

We can see a clear positive relationship between xit and yit. This is partly due to the positve coefficient β which is here equal to two and partly because of the positive trend.

Pure (FD-FD):

d2x_df_s = pd.DataFrame(d2x.reshape(3,200))
d2y_df_s = pd.DataFrame(d2y.reshape(3,200))
# Create traces

trace1 = go.Scatter(
    x = d2x_df_s[v].values,
    y = d2y_df_s[v].values,
    mode = 'lines+markers',
    name = 'lines+markers'
)
trace2 = go.Scatter(
    x = d2x_df_s[v+1].values,
    y = d2y_df_s[v+1].values,
    mode = 'lines+markers',
    name = 'lines+markers'
)
trace3 = go.Scatter(
    x = d2x_df_s[v+2].values,
    y = d2y_df_s[v+2].values,
    mode = 'lines+markers',
    name = 'lines+markers'
)
trace4 = go.Scatter(
    x = d2x_df_s[v+3].values,
    y = d2y_df_s[v+3].values,
    mode = 'lines+markers',
    name = 'lines+markers'
)
trace0 = go.Scatter(
    x = np.concatenate((d2x_df_s[v], d2x_df_s[v+1], d2x_df_s[v+2], d2x_df_s[v+3]), axis=0),
    y = np.concatenate((d2y_df_s[v], d2y_df_s[v+1], d2y_df_s[v+2], d2y_df_s[v+3]), axis=0),
    mode = 'markers',
    name = 'markers'
)
data = [ trace1, trace2, trace3, trace4, trace0]

# Plot and embed in ipython notebook!
py.iplot(data, filename='fig_8')

We can see that the clear trend is gone and just from the picture it is not easily to see that there is a true prositive relationship. This might be the reason why the countor histogram was more curvly for the pure random trend approach.

Mix (FD-FE):

dwx_s = dx.reshape(4,200) - dx_mean_i.reshape(4,200)
dwy_s = dy.reshape(4,200) - dy_mean_i.reshape(4,200)

dwx_df_s = pd.DataFrame(dwx_s.reshape(4,200))
dwy_df_s = pd.DataFrame(dwy_s.reshape(4,200))
# Create traces

trace1 = go.Scatter(
    x = dwx_df_s[v].values,
    y = dwy_df_s[v].values,
    mode = 'lines+markers',
    name = 'lines+markers'
)
trace2 = go.Scatter(
    x = dwx_df_s[v+1].values,
    y = dwy_df_s[v+1].values,
    mode = 'lines+markers',
    name = 'lines+markers'
)
trace3 = go.Scatter(
    x = dwx_df_s[v+2].values,
    y = dwy_df_s[v+2].values,
    mode = 'lines+markers',
    name = 'lines+markers'
)
trace4 = go.Scatter(
    x = dwx_df_s[v+3].values,
    y = dwy_df_s[v+3].values,
    mode = 'lines+markers',
    name = 'lines+markers'
)
trace0 = go.Scatter(
    x = np.concatenate((dwx_df_s[v], dwx_df_s[v+1], dwx_df_s[v+2], dwx_df_s[v+3]), axis=0),
    y = np.concatenate((dwy_df_s[v], dwy_df_s[v+1], dwy_df_s[v+2], dwy_df_s[v+3]), axis=0),
    mode = 'markers',
    name = 'markers'
)
data = [ trace1, trace2, trace3, trace4, trace0]

# Plot and embed in ipython notebook!
py.iplot(data, filename='fig_9')

Here the picture seems a bit more clear and truely different to the other approach. It does seem a bit odd from there that the results are so similar (later on), but we jsut look at four individuals here. This might also be the reason for the smaller difference in the results.

3.3. Regression

######### Regression
# First-Difference
x = np.mat(dx).T
y = np.mat(dy).T
est_fd = (x.T * x).I * x.T * y

x2 = np.mat(d2x).T
y2 = np.mat(d2y).T
est_pure = (x2.T * x2).I * x2.T * y2

xw = np.mat(dwx).T
yw = np.mat(dwy).T
est_mix = (xw.T * xw).I * xw.T * yw

res = np.array((est_fd, est_pure, est_mix))
res = np.mat(res)
res_df = pd.DataFrame(res)
res_df.columns = ["FD", "RT-pure", "RT-mix",]
res_df.index = ["Beta"]
print res_df

########################## Possible OLS-regression packages we decided against. We kept in the code just in case we want to find it somewhen later on.
#model = sm.OLS(dy, dx)
#results = model.fit()
#print(results.summary())
#print('Parameters: ', results.params)

#model = sm.OLS(d2y, d2x)
#results = model.fit()
#print(results.summary())
#print('Parameters: ', results.params)
           FD   RT-pure    RT-mix
Beta  3.00025  1.991141  1.972837

The results show for the first two entries a similar picture to what we had in in the Stata results as well. The FD is upwards biased due to the linear specific trend. The pure estimation solves for the engogeneity problem and gives a coefficient very close to the true value. For the mix, where we first calculate the first-difference and then subtract the mean specific to each individual performs not as good as before. The coeffcient is equal to 1.9575542, so roughly biased by 0.04. Looking into the Stata help file of the xtreg command shows us that in Stata does not just uses the within-transformation, but also adds the overall average:


$$y_{it} - \bar{y}_{i} + \bar{\bar{y}} = \beta \cdot (x_{it} - \bar{x}_{i} + \bar{\bar{x}}) + \alpha_i - \alpha_i + \bar{\alpha} + e_{it} - \bar{e}_{i} +\bar{\bar{e}} \quad $$

with $\bar{y}_{i} = \sum_{t=1}^T y_{it}$, $\bar{\bar{y}} = \sum_{i}\sum_{t} x_{it} /(nT_i)$

This however should just enable to calculate an intercept and have an impact on the standard errors. Another possibility would be that the seed was just not in favor for the mix method. This highly motivates to do a monte carlo study.

As noted before, we will run now the regression with 1000 replications to see how the estimations behaves.

3.4 Monte Carlo Study

r = 1000 # number of replications of the simulation

#### empty matrices generated to store the results at the end of the loop
# for first-difference
estimate_fd = np.mat(np.empty((1, r))) 
estimate_mat_fd = np.mat(np.empty((4, r))) 
beta_fd_hat = np.zeros(4)
se_fd_hat = np.zeros(4)

# for random-trend (1) (pure)
estimate_rt_p = np.mat(np.empty((1, r)))
estimate_mat_p = np.mat(np.empty((4, r))) 
beta_rt_p_hat = np.zeros(4)
se_rt_p_hat = np.zeros(4)

# for random-trend (2) (mix)
estimate_rt_m = np.mat(np.empty((1, r)))
estimate_mat_m = np.mat(np.empty((4, r))) 
beta_rt_m_hat = np.zeros(4)
se_rt_m_hat = np.zeros(4)
####

l = 0 # Used to move within a vector to store results at the end
n = 200 # defines the number of individuals
T = 5 # defines the number of time periods
N = n * T # maximum number of observations, necessary for the calculations
intercept = 3 # Even though the intercept cancels out, we leave it in the model because it was in the basic model
for beta in (-5, -1, 0.5, 2): # Here we actually run the code vor reveral beta's, -5, -1, 0.5 and 2
    # in stata it would be `forvalue beta in (-5, -1, 0.5, 2)`
    for j in range(0, r): # Loop for the different amounts of replications of the simulation, done for each beta     
        # DGP
        random.seed(r) # Seed is defined in the loop to have a different seed per replication, but still knowing which seed's are used
        alpha_i = np.random.normal(0, 1, n)
        alpha_i = np.repeat(alpha_i,T) 
        i = np.arange(n)
        i = np.repeat(i,T)
        t = np.arange(T) + 1 
        t = np.tile(t,n)
        trend = t*i 
        mean = [0, 0]
        cov = [[1, 0], [0, 1]]
        nu_it, e_it = np.random.multivariate_normal(mean, cov, N).T 
        x_it = (nu_it + alpha_i + trend).T 
        y_it = (intercept + alpha_i + beta*x_it + trend + e_it).T # <=> `g y_it = 3 + alpha_i + 2*x_it + trend + e_it `
        
        ############################### Transformation for the first-difference and randon-trend (1) and (2)
        
        # First-difference and random-trend (1)
        x_reshape = np.reshape(np.ravel(x_it), (n, T)).T 
        dx = np.zeros(x_reshape.shape)
        dx[1:] = x_reshape[1:] - x_reshape[:-1] 
        dx = np.delete(dx,(0), axis=0) 
        dx_reshape = dx # result is stored for the within-transformation
        d2x = np.zeros(dx.shape) 
        d2x[1:] = dx[1:] - dx[:-1] 
        d2x = np.delete(d2x,(0), axis=0)
        
        dX = np.reshape(dx.T, N-n) 
        d2X = np.reshape(d2x.T, N-2*n)

        y_reshape = np.reshape(np.ravel(y_it), (n, T)).T 
        dy = np.zeros(y_reshape.shape)
        dy[1:] = y_reshape[1:] - y_reshape[:-1] 
        dy = np.delete(dy,(0), axis=0)
        dy_reshape = dy # result is stored for the within-transformation
        d2y = np.zeros(dy.shape)
        d2y[1:] = dy[1:] - dy[:-1] 
        d2y = np.delete(d2y,(0), axis=0)
        dY = np.reshape(dy.T, N-n)
        d2Y = np.reshape(d2y.T, N-2*n)
        
        # Within-transformation
        dx_mean = dx_reshape.mean(axis=0) 
        dx_mean_i = np.repeat(dx_mean,T-1) 
        dx_mean_i_reshape = np.reshape(dx_mean_i.T, N-n) 
        dwX = dX - dx_mean_i_reshape
        # dwX is now constructed for the Random-Trend 2, mix
        # Same for the y-variable
        dy_mean = dy_reshape.mean(axis=0) 
        dy_mean_i = np.repeat(dy_mean,T-1) 
        dy_mean_i_reshape = np.reshape(dy_mean_i.T, N-n) 
        dwY = dY - dy_mean_i_reshape 
        # dwY is now constructed for the Random-Trend 2, mix
        
        ################# First-Difference Regression
        X = np.mat(dX).T
        Y = np.mat(dY).T
        M = (X.T * X).I * X.T
        estimate_fd[:, j] = M * Y
        estimate_mat_fd[l] = estimate_fd
        
        ################# Random-Trend (1), pure Regression
        X2 = np.mat(d2X).T
        Y2 = np.mat(d2Y).T
        M2 = (X2.T * X2).I * X2.T
        estimate_rt_p[:, j] = M2 * Y2
        estimate_mat_p[l] = estimate_rt_p
        
        ################# Random-Trend (2), mix Regression
        Xw = np.mat(dwX).T
        Yw = np.mat(dwY).T
        Mw = (Xw.T * Xw).I * Xw.T
        estimate_rt_m[:, j] = Mw * Yw 
        estimate_mat_m[l] = estimate_rt_m
        
        
        
    mu_fd = estimate_fd.mean()
    var_fd = estimate_fd.var()
    sigma_fd = np.sqrt(var_fd)
    beta_fd_hat[l] = mu_fd
    se_fd_hat[l] = sigma_fd
    
    mu_rt_p = estimate_rt_p.mean()
    var_rt_p = estimate_rt_p.var()
    sigma_rt_p = np.sqrt(var_rt_p)
    beta_rt_p_hat[l] = mu_rt_p
    se_rt_p_hat[l] = sigma_rt_p
    
    mu_rt_m = estimate_rt_m.mean()
    var_rt_m = estimate_rt_m.var()
    sigma_rt_m = np.sqrt(var_rt_m)
    beta_rt_m_hat[l] = mu_rt_m
    se_rt_m_hat[l] = sigma_rt_m
    
    l = l + 1
    

#estimate_fd = np.squeeze(np.asarray(estimate_fd))


#estimate_rt_p = np.squeeze(np.asarray(estimate_rt_p))


#estimate_rt_m = np.squeeze(np.asarray(estimate_rt_m))


results = np.matrix((beta_fd_hat, se_fd_hat, beta_rt_p_hat, se_rt_p_hat, beta_rt_m_hat, se_rt_m_hat))
results_df = pd.DataFrame(results)
results_df.index = ["FD_beta", "FD_se", "RT-1_beta", "RT-1_se", "RT-2_beta", "RT-2_se"]
results_df.columns = ["-5", "-1", "0.5", "2"]
print results_df
                 -5        -1       0.5         2
FD_beta   -4.000153 -0.000150  1.499849  2.999850
FD_se      0.000295  0.000314  0.000304  0.000308
RT-1_beta -5.001473 -1.000953  0.502678  2.001597
RT-1_se    0.052119  0.051656  0.051588  0.052766
RT-2_beta -5.001120 -1.000302  0.502618  2.000040
RT-2_se    0.044748  0.043349  0.042506  0.044230

In the output table results one can see the results of the estimated coefficient β for the first-difference and random-trend pure (1) and mix (2). Further, we included the the standard deviation of the distribution the estimated coefficients have from the various replication (so, this is not a mean from each standard error, but just the one standard error from the coefficients). The numbers of each columns show the true value of the coefficient.

We see that the now the RT-mix is not much different from the pure estimation as before and that both are unbiased. Differences are only slightly. The FD is biased though. Looking at the standard errors, using the second data transformation increases the spread of the estimation in the replications. Within the method they are quite constant, but both RT have much higher standard errors than the FD.

For a better overview of the results, the next cells will do some comparisons. First we calculate the differences of the estimated $\hat{\beta}$ (given by beta_fd_hat for the first-difference) and the true β (given by beta).

3.4.1 Comparison of the Monte Carlo Results

beta = np.array((-5, -1, 0.5, 2))
diff_fd = beta_fd_hat - beta
diff_fd_r = np.round(np.absolute(diff_fd),0)

diff_rt_p = beta_rt_p_hat - beta
diff_rt_p_r = np.round(np.absolute(diff_rt_p),0)

diff_rt_m = beta_rt_m_hat - beta
diff_rt_m_r = np.round(np.absolute(diff_rt_m),0)

diff_se_rt = se_rt_p_hat - se_rt_m_hat        
        

To see whether the differences from the true coefficient is large, we compare the estimate to the value 0.01 and let python tell us whether the difference is bigger or smaller than the chosen value.:

print ""
print "Test if the difference between estimator of the first-difference and the true value is significant unequal to zero" 
for l in range(0,4):
    if diff_fd[l] / se_fd_hat[l] > 1.96:
        print "Reject"
    else:
        print "Don't reject"  
print ""
print "Test if the difference between estimator of the random-trend (1) and the true value is significant unequal to zero"         
for l in range(0,4):
    if diff_rt_m[l] / se_rt_m_hat[l] > 1.96:
        print "Reject"
    else:
        print "Don't reject" 

print ""
print "Test if the difference between estimator of the random-trend (2) and the true value is significant unequal to zero"                 
for l in range(0,4):
    if diff_rt_m[l] / se_rt_m_hat[l] > 1.96:
        print "Reject"
    else:
        print "Don't reject"         
        
Test if the difference between estimator of the first-difference and the true value is significant unequal to zero
Reject
Reject
Reject
Reject

Test if the difference between estimator of the random-trend (1) and the true value is significant unequal to zero
Don't reject
Don't reject
Don't reject
Don't reject

Test if the difference between estimator of the random-trend (2) and the true value is significant unequal to zero
Don't reject
Don't reject
Don't reject
Don't reject

For the test we used the 95%-quantile. We conclude that both, the pure and the mix random-trend give consistent estimates. The first-difference (as expected) gives us inconsistent results. Thereby we conferm the results we obtained from stata above. It seems that the code conducted by us works. Works in the sense that the transformations are done correctly.

To see however whether pure or mix is better, we will see which difference of the coefficient is smaller and which standard-errors are smaller

print "Which method produces a smaller differences with the true value?"
for l in range(0,4):
    if diff_rt_p[l] < diff_rt_m[l]:
        print "Pure"
    else:
        print "Mix" 

print ""        
print "Which method produces smaller standard errors?"        
for l in range(0,4):
    if se_rt_p_hat[l] < se_rt_m_hat[l]:
        print "Pure"
    else:
        print "Mix" 
        
print ""
print "Difference of the standard errors:"
print diff_se_rt
Which method produces a smaller differences with the true value?
Pure
Pure
Mix
Mix

Which method produces smaller standard errors?
Mix
Mix
Mix
Mix

Difference of the standard errors:
[ 0.0073705   0.0083073   0.00908187  0.00853632]

For our simulation, it does not seem that for the mean of the coefficient it makes a different whether ones uses the first or the second random-trend approach. Here, one time the mix-approach has a smaller difference and three times the pure-approach has a smaller difference. The differences overall are very small, so small that in empirical papers it should not matter.

For the standard error however, always the mix-approach is preferred. This means that the for the 1000 replications, the mix-approach is more accurate than the pure-approach. The difference of the standard errors is at the second decimal digit which, from our point of view, does not seem to be small enough to neglect it.

We therefore conclude that it might be reasonable to suggest to use the second random-trend approach, in which one calculates first the first-difference to cancel out the constant and transform the linear trend into a constant and then takes use of the within-transformation to cancel out the remaining constant.

At the very last we also have a graphical view on our estimation.

3.5 Graphical Comparison of the Estimation Methods

########## Restoring the data to panda data frames for the plot
#fd
estimate_fd_df = pd.DataFrame(estimate_mat_fd).T

#mix
estimate_m_df = pd.DataFrame(estimate_mat_m).T

#pure
estimate_p_df = pd.DataFrame(estimate_mat_p).T

########## Generating plots using plotly

cf.set_config_file(offline=False, world_readable=True, theme='pearl')

df = pd.DataFrame({'FD': estimate_fd_df[0],
                   'FD-FE': estimate_m_df[0],
                   'FD-FD': estimate_p_df[0]})
#df.head(2)

df.iplot(kind='histogram', subplots=True, shape=(1, 3),
         title = "Histogram for estimates with true beta = (-5)",
         bins = 30,
         subplot_titles=('FD Estimation', 'FD-FE Estimation','FD-FD Estimation'),
         filename='cufflinks/fig_1')

# See https://plot.ly/ipython-notebooks/cufflinks/ for further options

The histogram shows that for all three methods the distribution looks similar to a normal distribution which was to be expected. The estimation therefore behaves nicely and gives confident that there is no major errors in the DGP nor the the by-hand regressions.

For a better comparison we plot the mix (yellow) and pure (blue) estimation within one plot to show differences. Since the FD estimation is biased and would only make the graphical comparison more difficult, we don't include it furthermore.

Overlay histogram with true β equal to (-5)

###### True beta = -5

cf.set_config_file(offline=False, world_readable=True, theme='pearl')

df = pd.DataFrame({'FD-FE': estimate_m_df[0],
                   'FD-FD': estimate_p_df[0]})
#df.head(2)

df.iplot(kind='histogram',barmode='overlay', 
         histnorm='percent', 
         title = "Overlay-Histogram for RT mix and pure, true beta = (-5)",
         filename='cufflinks/fig_2')

Overlay histogram with true β equal to (-1)

###### True beta = -1

cf.set_config_file(offline=False, world_readable=True, theme='pearl')

df = pd.DataFrame({'FD-FE': estimate_m_df[1],
                   'FD-FD': estimate_p_df[1]})
#df.head(2)

df.iplot(kind='histogram',barmode='overlay', 
         histnorm='percent', 
         title = "Overlay-Histogram for RT mix and pure, true beta = (-1)",
         filename='cufflinks/fig_3')

Overlay histogram with true β equal to (0.5)

###### True beta = 0.5

cf.set_config_file(offline=False, world_readable=True, theme='pearl')

df = pd.DataFrame({'FD-FE': estimate_m_df[2],
                   'FD-FD': estimate_p_df[2]})
#df.head(2)

df.iplot(kind='histogram',barmode='overlay', 
         histnorm='percent', 
         title = "Overlay-Histogram for RT mix and pure, true beta = 0.5",
         filename='cufflinks/fig_4')

Overlay histogram with true β equal to 2

###### True beta = 2

cf.set_config_file(offline=False, world_readable=True, theme='pearl')

df = pd.DataFrame({'FD-FE': estimate_m_df[3],
                   'FD-FD': estimate_p_df[3]})
#df.head(2)

df.iplot(kind='histogram',barmode='overlay', 
         histnorm='percent', 
         title = "Overlay-Histogram for RT mix and pure, true beta = 2",
         filename='cufflinks/fig_5')

One can see that for all fours graphs the mix estimation results have a higher kurtosis and (consequently) less fatter tails than the pure estimation results (neglecting the two outliers to the right for mix).

This strengthens our conclusions from the comparison of the means and standard errors. The mix method seem to perform better in the sense that it is more narrowed to the true value.

4. Conclusion

In this study we have looked at some foundational panel data models. Using both, commercial and open software packages we performed numerical simulation based on a predefined data generating process. Our interest was first of all to find an unbiased estimator after a linear random-trend has already introduced some bias in the classical panel estimation methods. We evaluated the consistency and effectiveness of several alternative approaches in estimating a model which contains a random-trend component. We compared two competing theoretical approaches for getting rid of the trend: (i) first-differencing the data twice (ii) within transformation after first differencing once.

We have evaluated the robustness of the two approaches. Also, we have compared the effectiveness of stata vs python based open source libraries including numpy and pandas.

4.1. Comparison between models

We have found that a random-trend model solves the endogeneity problem imposed by the linear trend. The RT estimation is superior to the FD and FE estimations because they can only solve for an endogeneity problem induced by a constant. However, we saw that the first-difference works if the trend is the same for all individuals or the marginal increase decreases (log). Further, the random-trend is equally biased compared to the first-difference if the trend increases exponentially over time. More research has to be done here to for a more precise view.

The comparison of the different RT methods suggests that the mix-approach (FD-FE) is superior as long as the calculation time is not much of a burden. Giving the bad results for nonlinear trends, it is questionable if the RT estimation model overall can be used in empirical work if have reason to believe that the trend can be exponential for example. As a result, more research has to be done for the estimation of non-linear trends.

4.2. Comparison between Stata and Python

We found several advantages of Stata over open source python libraries - Stata is an easier set up because of the richness of implemented commands designed for econometric work. As a result it is faster to code when one wants to run a simple or complex regression. - It offers a huge amount of online example and is well documented.

Some disadvantages of using Stata (in notebook) can be listed as well - Does not save background estimation results (saved in vectors or matrices originally in Stata) - It does not go hand in hand to synchronize estimation output with other python packages (i.e. creating plots in plotly) - Seems to be slower than within Stata - Not all commands work

Still if one is more comfortable with Stata, or has old code that can be reused, the ipystata add-in could be a convenient addition to the econometrician's toolbox. In those cases, as we found out one transition smoothly and translate gradually more and more code from Stata to Python and thus get the best out of both.

Python Pros: - There is wide debate about the advantages of python for scientific research. Instead of naming them here as well, we recommend this blog post.

Python Cons - There is a steep learning curve and one needs to be able to code really well to utilize its full potential. We did not find any library that was especially developed for complex econometric work (e.g. panel data). Of course, we are hoping that it is only a matter of time that this gets sorted out.

We failed to find a packages to safe the time python needs to run. For Stata clearly the FD is much faster than the FE estimation, and thereby pure faster than mix. A comparison is also difficult because while coding using Python it was convenient to run all three transformations (FD, FD-FE, FD-FD) within one cell. Just from our feeling, the code does run very fast for all and might be even faster than the FE-Estimation in Stata. However, we do the estimation ourselves and calculate only the coefficients we need at hand. Stata on the contrary estimates usually much more just the coefficient which of course costs more time as well.

Last step of our journey is to convert this python workbook into HTML. This can be done easily by executing the following command into the command prompt

jupyter nbconvert Third_Assignment.ipynb --to html --template full