This page contains links to a set of figures, tables, and programs that we left out of the submitted versions of our papers because of space constraints or a judgement that they were of specialized interest.
This was my job market paper and I remain fond of it. It estimates a model of the steel rail industry that is "structural enough" that I could use it for policy counterfactuals. The question of whether infant industry protection can be used predictably to improve welfare remains, I think, an important unresolved issue.
The data in Stata 13 format, a file showing definitions and sources of variables, and the raw text and dictionary files.
Crozet, Matthieu, Keith Head, and Thierry Mayer CEPR Discussion Paper.
Monte Carlo simulation of Tobit with log normal disturbances and an unknown censoring point
All Champagne regressions and means figures re-estimated using the red Burgundy data
Additional results and explanation not included in the working paper.
Head, K., T. Mayer and J. Ries, 2010, Journal of International Economics, 81(1):1-14.
The "square" gravity dataset for all world pairs of countries, 1948-2006.
All variables are described in the paper which you should turn to for details and reference when using this data. We also provide for replication purposes a smaller dataset, restricted to observations with non-missing trade flows.
This paper introduces a method for estimating bilateral trade equations that we call the "method of Tetrads." It has broad applicability to models with a bilateral dependent variable x(ij) which is modeled as a multiplicative function of two monadic terms, s(i) and m(j), and dyadic function d(ij) that depends on pairwise variables and a log-normal error term.
Stata ado file for the tetrad regression. You specify the regression you want to run and two countries to use as the "reference" countries k and "ell"; the i and j countries are given by another variable iso_o (i) and iso_d (j). See code for more info.
CGM program for the standard errors, multi-way clustered.
Stata do file to run interactively to obtain results from columns 4-6 of Table 2 of our CEPR paper. You'll need the old version of data set, col_regfile08.dta, for this to work. Eventually we will be uploading the current version of the data, col_regfile09.dta.
In the current version of the paper, we have specification (6) that estimates tetrads on a 30 different sets of reference pairs. For this specification we don't show the actual standard errors, just the standard deviations of the estimates from the 30 regressions and also we show the range from the 10th to the 90th percentile. For these regressions you do not need to CGM standard errors which add a lot of time to the calculation. Hence we use tetrad_areg.ado which employs Stata's areg command. The standard errors are not useful but the coefficients are the same as tetrad.ado would yield.
This figure (deleted from the CEPR version to save space) shows the number of dyads (observations for
exporter i and importer j) of positive bilateral trade flows in
each year according to the timing of independence. We show four
categories of colonial relationships: current colonies (solid lines)
as well as former colonies after 1-19 years (long dashes), 20-49
years (shorter dashes), and more than 50 years (dots and dashes) of
independence. The main point we draw from this figure is that sample
sizes appear large enough to estimate the effects of varying numbers
of years since independence. The bump up in trade dyads for current
colonies arises because of increases in data availability in 1958
(France begins to report data on its dependencies) and 1960 (newly
independent French colonies begin to report). The 1961 jump in dyads
that have been independent 1-19 years is followed two decades later
by a jump in the number of dyads with 20-49 years, as the African
former French colonies "progress" through intervals of
While the problems with estimating conditional-on-positive (COP) regressions are known, using Tobit as a way to integrate the zero trade observations is problematic. Here we show the "bad" COP regressions followed by four different Tobits. The first table shows the tobit coefficients which are marginal effects in terms of the latent (unobserved) variable. Angrist and Pischke argue that the proper object of interest is the marginal effects for the observed dependent variable. These effects are always scaled down by a fixed proportion (related to the share of zeros) so we only show these marginal effects for subset of the regressors. The main takeaway from these tables is small reasonable changes in the method of implementing tobit yield serious changes in the results. Indeed COP lies between these results, suggesting it is not so bad after all.
Graphical methods help by showing what's in the raw data but the challenge is to devise figures that control for other variables. That is, how do you make a two dimensional figure more like multiple regression? In this paper we present two figures that illustrate a method that may have broader applications. We compare trade of two colonies with their respective metropoles relative to a gravity benchmark given by the relative size of the metropole economies. We implemented the method in R for more cases than would fit in the paper so here is a larger set of examples.
The algebra and programming for decomposing independence effects into how they affect trade from the metropole, siblings (which may or may not have exited the empire in the same year as the given country), and rest of world is explained briefly in words in our paper but the full algebra underlying our subsection 4.6 results is shown here.
Figures showing the Metropole-Siblings-ROW effects for amicable and hostile separations.
Head, Keith and John Ries, 2008, Journal of International Economics .
Our paper assumes that distance costs of FDI enter in a linear-in-logs (constant elasticity) formulation. We compare this to the the non-parametric approach that Eaton and Kortum (2002) used for trade flows. They divide bilateral distances into six categories with thresholds (in miles) of 100, 375, 750, 1500, 3000, and 6000.
The results, generated in an R program for the two methods are
compared graphically in PDF figures for levels
and logs. Confidence intervals for the
category dummies are shown at the midpoint of each interval. These
depend on the estimated standard error of each distance dummy. The
figures also show the implied distance reduction under the linear
in logs parametric approach. The midpoint of the first interval is
normalized to zero under both approaches. The distance reduction
is calculated as -theta ln (d_k / d_1) where d_k are the six
midpoints. The figures show that the parametric prediction always
lies within the confidence intervals of the corresponding the
non-parametric category. In addition, the parametric equation
obtains slightly higher R-squared and slightly lower RMSEs.
Head, Keith, Thierry Mayer, and John Ries, CEPR DP 6542. forthcoming at the European Economic Review
Stata code for the two-step negative binomial QMLE. This specification regresses a non-negative flow variable (in levels, not in logs) on a right hand side specified with an "RHS" macro.
global RHS = "lgdp_o lgdp_d ld lang colony comleg" glm flow_o $RHS if flow_o>=0, family(nbinomial 1) robust irls predict mhat gen uhatsq = (flow_o-mhat)^2 gen mhatsq = mhat^2 gen y = uhatsq - mhat reg y mhatsq, nocon scalar etasq = _b[mhatsq] drop y mhatsq mhat uhatsq global a=etasq glm flow_o $RHS if flow_o>=0,family(nbinomial $a) robust irls di "dist effect:" -_b[ld] " standard error: " _se[ld] " number of obs:" e(N) "etasq = " $a
Use of the two-step negative binomial QMLE for non-count data may not be a good idea because the estimates depend on the units. Poisson and Gamma QMLE results do not depend on the units of the dependent variable. See `Gravity, log of gravity and the "distance puzzle"' by Clement Bosquet and Herve Boulhol for a critique of using negative binomial QMLE for estimating gravity equations. These authors do not like Gamma QMLE either.
Head, Keith and Anne-Celia Disdier Review of Economics and Statistics 2008.
Is there an objective criterion one can use for eliminating outliers from a single series? The whole practice of eliminating outliers is frowned upon by many. However, if you have a good reason to do so (we had wanted to graph the data and some of the outliers were so far out the graph would have been useless), it makes sense to follow an objective and replicable procedure: We used the Grubbs test which can be implemented in Stata using this Stata .ado file (Anne-Celia Disdier and I wrote the code. Let me know if you find bugs).