Start fresh ...
clear all
Reversal of Fortune: Geography and Institutions in the Making of the Modern World Income Distribution.. Daron Acemoglu, Simon Johnson, and James A. Robinson. Quarterly Journal of Economics, 117, November 2002: pp. 1231-1294.,
In this 2002 article, Acemoglu, Johnson, and Robinson argue that countries that were more wealthy and urbanized in the 1500s saw their fortunes reverse in the subsequent centuries. Countries such as Rwanda and Tanzania were high-density areas in the 1500s but in the 20th century had low GDP per capita. The authors argue that this is because European colonialism settled more in areas that were less developed in the 1500s, but then went on to become strong economies. A simple bivariate relationship motivates their argument.
. use reversal.dta, clear
. describe
Contains data from reversal.dta
obs: 91
vars: 10 26 Nov 2018 22:42
size: 7,553
───────────────────────────────────────────────────────────────────────────────────────────
storage display value
variable name type format label variable label
───────────────────────────────────────────────────────────────────────────────────────────
countryn str20 %-9s Country Name
shortnam str3 %-9s Country Name
logpgp95 double %10.0g Log GDP per Capita in 1995
logem4 double %10.0g
urbz1995 double %10.0g Urbanization in 1995 (Proportion Population
in Large Towns)
lpd1500s double %10.0g Log Population Density 1500s
cu1500 double %10.0g Urbanization in 1000s (Chandler)
sjb1500 double %10.0g Urbanization in 1500s (Bairoch)
sjb1000 double %10.0g Urbanization in 1000s (Bairoch)
continent long %8.0g continent
───────────────────────────────────────────────────────────────────────────────────────────
Sorted by:
Use the regress or reg command to run a regression.
. regress logpgp95 lpd1500s
Source │ SS df MS Number of obs = 91
─────────────┼────────────────────────────────── F(1, 89) = 46.12
Model │ 30.3661927 1 30.3661927 Prob > F = 0.0000
Residual │ 58.5990948 89 .658416795 R-squared = 0.3413
─────────────┼────────────────────────────────── Adj R-squared = 0.3339
Total │ 88.9652874 90 .988503194 Root MSE = .81143
─────────────┬────────────────────────────────────────────────────────────────
logpgp95 │ Coef. Std. Err. t P>|t| [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
lpd1500s │ -.3766786 .0554659 -6.79 0.000 -.4868881 -.266469
_cons │ 8.090425 .0887273 91.18 0.000 7.914126 8.266725
─────────────┴────────────────────────────────────────────────────────────────
In a simple regression, a scatter plot can show the data cleanly
. twoway scatter logpgp95 lpd1500s || lfit logpgp95 lpd1500s, /// > ytitle(1950 GDP per Capita) /// > xtitle(1500s Population Density) /// > legend(off) . graph export reversal_fit.png, width(2000) replace (file reversal_fit.png written in PNG format)
The Best Fit Regression Line
Different options:
. regress logpgp95 lpd1500s, beta
Source │ SS df MS Number of obs = 91
─────────────┼────────────────────────────────── F(1, 89) = 46.12
Model │ 30.3661927 1 30.3661927 Prob > F = 0.0000
Residual │ 58.5990948 89 .658416795 R-squared = 0.3413
─────────────┼────────────────────────────────── Adj R-squared = 0.3339
Total │ 88.9652874 90 .988503194 Root MSE = .81143
─────────────┬────────────────────────────────────────────────────────────────
logpgp95 │ Coef. Std. Err. t P>|t| Beta
─────────────┼────────────────────────────────────────────────────────────────
lpd1500s │ -.3766786 .0554659 -6.79 0.000 -.5842314
_cons │ 8.090425 .0887273 91.18 0.000 .
─────────────┴────────────────────────────────────────────────────────────────
Multiple regression -- add more terms.
. reg logpgp95 lpd1500s sjb1000
Source │ SS df MS Number of obs = 34
─────────────┼────────────────────────────────── F(2, 31) = 21.66
Model │ 14.2886648 2 7.1443324 Prob > F = 0.0000
Residual │ 10.2267689 31 .329895772 R-squared = 0.5828
─────────────┼────────────────────────────────── Adj R-squared = 0.5559
Total │ 24.5154337 33 .742891931 Root MSE = .57437
─────────────┬────────────────────────────────────────────────────────────────
logpgp95 │ Coef. Std. Err. t P>|t| [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
lpd1500s │ -.321548 .0551574 -5.83 0.000 -.4340423 -.2090538
sjb1000 │ -.0031051 .0284966 -0.11 0.914 -.0612242 .055014
_cons │ 8.642748 .1485872 58.17 0.000 8.339703 8.945794
─────────────┴────────────────────────────────────────────────────────────────
What if explanatory variable is a categorical?
. tab continent
continent │ Freq. Percent Cum.
────────────┼───────────────────────────────────
Africa │ 45 49.45 49.45
Americas │ 32 35.16 84.62
Asia │ 12 13.19 97.80
Oceania │ 2 2.20 100.00
────────────┼───────────────────────────────────
Total │ 91 100.00
. reg logpgp95 lpd1500s sjb1000 continent
Source │ SS df MS Number of obs = 34
─────────────┼────────────────────────────────── F(3, 30) = 13.97
Model │ 14.2889201 3 4.76297337 Prob > F = 0.0000
Residual │ 10.2265136 30 .340883787 R-squared = 0.5829
─────────────┼────────────────────────────────── Adj R-squared = 0.5411
Total │ 24.5154337 33 .742891931 Root MSE = .58385
─────────────┬────────────────────────────────────────────────────────────────
logpgp95 │ Coef. Std. Err. t P>|t| [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
lpd1500s │ -.3212955 .0568227 -5.65 0.000 -.437343 -.205248
sjb1000 │ -.0028831 .0300819 -0.10 0.924 -.0643186 .0585524
continent │ .0041495 .1516197 0.03 0.978 -.3054991 .3137982
_cons │ 8.632706 .3968173 21.75 0.000 7.822297 9.443115
─────────────┴────────────────────────────────────────────────────────────────
(what's wrong with this?)
. reg logpgp95 lpd1500s sjb1000 i.continent
Source │ SS df MS Number of obs = 34
─────────────┼────────────────────────────────── F(5, 28) = 11.84
Model │ 16.6431982 5 3.32863964 Prob > F = 0.0000
Residual │ 7.87223551 28 .281151268 R-squared = 0.6789
─────────────┼────────────────────────────────── Adj R-squared = 0.6215
Total │ 24.5154337 33 .742891931 Root MSE = .53024
─────────────┬────────────────────────────────────────────────────────────────
logpgp95 │ Coef. Std. Err. t P>|t| [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
lpd1500s │ -.4108716 .0664944 -6.18 0.000 -.5470792 -.2746641
sjb1000 │ .017904 .0287151 0.62 0.538 -.0409162 .0767241
│
continent │
Americas │ -.9076474 .3537367 -2.57 0.016 -1.632244 -.1830505
Asia │ -.5490911 .3531067 -1.56 0.131 -1.272397 .1742151
Oceania │ -.3875177 .548546 -0.71 0.486 -1.511163 .7361279
│
_cons │ 9.260819 .3231567 28.66 0.000 8.598863 9.922776
─────────────┴────────────────────────────────────────────────────────────────
Create a new variable in dataset
. reg logpgp95 lpd1500s sjb1000 i.continent
Source │ SS df MS Number of obs = 34
─────────────┼────────────────────────────────── F(5, 28) = 11.84
Model │ 16.6431982 5 3.32863964 Prob > F = 0.0000
Residual │ 7.87223551 28 .281151268 R-squared = 0.6789
─────────────┼────────────────────────────────── Adj R-squared = 0.6215
Total │ 24.5154337 33 .742891931 Root MSE = .53024
─────────────┬────────────────────────────────────────────────────────────────
logpgp95 │ Coef. Std. Err. t P>|t| [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
lpd1500s │ -.4108716 .0664944 -6.18 0.000 -.5470792 -.2746641
sjb1000 │ .017904 .0287151 0.62 0.538 -.0409162 .0767241
│
continent │
Americas │ -.9076474 .3537367 -2.57 0.016 -1.632244 -.1830505
Asia │ -.5490911 .3531067 -1.56 0.131 -1.272397 .1742151
Oceania │ -.3875177 .548546 -0.71 0.486 -1.511163 .7361279
│
_cons │ 9.260819 .3231567 28.66 0.000 8.598863 9.922776
─────────────┴────────────────────────────────────────────────────────────────
. predict yhat
(option xb assumed; fitted values)
(57 missing values generated)
. summarize logpgp95 yhat
Variable │ Obs Mean Std. Dev. Min Max
─────────────┼─────────────────────────────────────────────────────────
logpgp95 │ 91 7.918999 .994235 6.109248 10.21574
yhat │ 34 8.614128 .7101685 7.443351 10.37231
Or residuals
. predict residuals, residuals
(57 missing values generated)
.
. summarize logpgp95 yhat residuals
Variable │ Obs Mean Std. Dev. Min Max
─────────────┼─────────────────────────────────────────────────────────
logpgp95 │ 91 7.918999 .994235 6.109248 10.21574
yhat │ 34 8.614128 .7101685 7.443351 10.37231
residuals │ 34 -2.74e-09 .4884185 -1.083866 .8918947
Instead of copy-pasting Stata output, use designated commands:
. reg logpgp95 lpd1500s sjb1000
Source │ SS df MS Number of obs = 34
─────────────┼────────────────────────────────── F(2, 31) = 21.66
Model │ 14.2886648 2 7.1443324 Prob > F = 0.0000
Residual │ 10.2267689 31 .329895772 R-squared = 0.5828
─────────────┼────────────────────────────────── Adj R-squared = 0.5559
Total │ 24.5154337 33 .742891931 Root MSE = .57437
─────────────┬────────────────────────────────────────────────────────────────
logpgp95 │ Coef. Std. Err. t P>|t| [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
lpd1500s │ -.321548 .0551574 -5.83 0.000 -.4340423 -.2090538
sjb1000 │ -.0031051 .0284966 -0.11 0.914 -.0612242 .055014
_cons │ 8.642748 .1485872 58.17 0.000 8.339703 8.945794
─────────────┴────────────────────────────────────────────────────────────────
. esttab
────────────────────────────
(1)
logpgp95
────────────────────────────
lpd1500s -0.322***
(-5.83)
sjb1000 -0.00311
(-0.11)
_cons 8.643***
(58.17)
────────────────────────────
N 34
────────────────────────────
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
.
. eststo clear // clear estimated tables
. eststo: reg logpgp95 lpd1500s
Source │ SS df MS Number of obs = 91
─────────────┼────────────────────────────────── F(1, 89) = 46.12
Model │ 30.3661927 1 30.3661927 Prob > F = 0.0000
Residual │ 58.5990948 89 .658416795 R-squared = 0.3413
─────────────┼────────────────────────────────── Adj R-squared = 0.3339
Total │ 88.9652874 90 .988503194 Root MSE = .81143
─────────────┬────────────────────────────────────────────────────────────────
logpgp95 │ Coef. Std. Err. t P>|t| [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
lpd1500s │ -.3766786 .0554659 -6.79 0.000 -.4868881 -.266469
_cons │ 8.090425 .0887273 91.18 0.000 7.914126 8.266725
─────────────┴────────────────────────────────────────────────────────────────
(est1 stored)
. eststo: reg logpgp95 lpd1500s sjb1000
Source │ SS df MS Number of obs = 34
─────────────┼────────────────────────────────── F(2, 31) = 21.66
Model │ 14.2886648 2 7.1443324 Prob > F = 0.0000
Residual │ 10.2267689 31 .329895772 R-squared = 0.5828
─────────────┼────────────────────────────────── Adj R-squared = 0.5559
Total │ 24.5154337 33 .742891931 Root MSE = .57437
─────────────┬────────────────────────────────────────────────────────────────
logpgp95 │ Coef. Std. Err. t P>|t| [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
lpd1500s │ -.321548 .0551574 -5.83 0.000 -.4340423 -.2090538
sjb1000 │ -.0031051 .0284966 -0.11 0.914 -.0612242 .055014
_cons │ 8.642748 .1485872 58.17 0.000 8.339703 8.945794
─────────────┴────────────────────────────────────────────────────────────────
(est2 stored)
. eststo: reg logpgp95 lpd1500s sjb1000 i.continent
Source │ SS df MS Number of obs = 34
─────────────┼────────────────────────────────── F(5, 28) = 11.84
Model │ 16.6431982 5 3.32863964 Prob > F = 0.0000
Residual │ 7.87223551 28 .281151268 R-squared = 0.6789
─────────────┼────────────────────────────────── Adj R-squared = 0.6215
Total │ 24.5154337 33 .742891931 Root MSE = .53024
─────────────┬────────────────────────────────────────────────────────────────
logpgp95 │ Coef. Std. Err. t P>|t| [95% Conf. Interval]
─────────────┼────────────────────────────────────────────────────────────────
lpd1500s │ -.4108716 .0664944 -6.18 0.000 -.5470792 -.2746641
sjb1000 │ .017904 .0287151 0.62 0.538 -.0409162 .0767241
│
continent │
Americas │ -.9076474 .3537367 -2.57 0.016 -1.632244 -.1830505
Asia │ -.5490911 .3531067 -1.56 0.131 -1.272397 .1742151
Oceania │ -.3875177 .548546 -0.71 0.486 -1.511163 .7361279
│
_cons │ 9.260819 .3231567 28.66 0.000 8.598863 9.922776
─────────────┴────────────────────────────────────────────────────────────────
(est3 stored)
. esttab
────────────────────────────────────────────────────────────
(1) (2) (3)
logpgp95 logpgp95 logpgp95
────────────────────────────────────────────────────────────
lpd1500s -0.377*** -0.322*** -0.411***
(-6.79) (-5.83) (-6.18)
sjb1000 -0.00311 0.0179
(-0.11) (0.62)
1.continent 0
(.)
2.continent -0.908*
(-2.57)
3.continent -0.549
(-1.56)
4.continent -0.388
(-0.71)
_cons 8.090*** 8.643*** 9.261***
(91.18) (58.17) (28.66)
────────────────────────────────────────────────────────────
N 91 34 34
────────────────────────────────────────────────────────────
t statistics in parentheses
* p<0.05, ** p<0.01, *** p<0.001
With more formats (http://repec.org/bocode/e/estout/esttab.html)
. esttab, label b(2) se(a1) r2 ar2
────────────────────────────────────────────────────────────────────
(1) (2) (3)
Log GDP~1995 Log GDP~1995 Log GDP~1995
────────────────────────────────────────────────────────────────────
Log Population~1500s -0.38*** -0.32*** -0.41***
(0.06) (0.06) (0.07)
Urbanization in 10~) -0.00 0.02
(0.03) (0.03)
Africa 0.00
(.)
Americas -0.91*
(0.4)
Asia -0.55
(0.4)
Oceania -0.39
(0.5)
Constant 8.09*** 8.64*** 9.26***
(0.09) (0.1) (0.3)
────────────────────────────────────────────────────────────────────
Observations 91 34 34
R-squared 0.341 0.583 0.679
Adjusted R-squared 0.334 0.556 0.622
────────────────────────────────────────────────────────────────────
Standard errors in parentheses
* p<0.05, ** p<0.01, *** p<0.001