Supplemental Material

This page contains links to a set of figures, tables, and programs that we left out of the submitted versions of our papers because of space constraints or a judgement that they were of specialized interest.

Quality sorting and trade: Firm-level evidence for French wine

Crozet, Matthieu, Keith Head, and Thierry Mayer CEPR Discussion Paper.

Monte Carlo simulation of Tobit with log normal disturbances and an unknown censoring point

All Champagne regressions and means figures re-estimated using the red Burgundy data

From Beijing to Bentonville: Do Multinational Retailers Link Markets?

Additional results and explanation not included in the working paper.

The erosion of colonial trade linkages after independence

HEAD, K., T. MAYER AND J. RIES, 2010, Journal of International Economics, 81(1):1-14.

The "square" gravity dataset for all world pairs of countries, 1948-2006.

All variables are described in the paper which you should turn to for details and reference when using this data. We also provide for replication purposes a smaller dataset, restricted to observations with non-missing trade flows.

This paper introduces a method for estimating bilateral trade equations that we call the "method of Tetrads." It has broad applicability to models with a bilateral dependent variable x(ij) which is modeled as a multiplicative function of two monadic terms, s(i) and m(j), and dyadic function d(ij) that depends on pairwise variables and a log-normal error term.

Stata ado file for the tetrad regression. You specify the regression you want to run and two countries to use as the "reference" countries k and "ell"; the i and j countries are given by another variable iso_o (i) and iso_d (j). See code for more info.

CGM program for the standard errors, multi-way clustered.

Stata do file to run interactively to obtain results from columns 4-6 of Table 2 of our CEPR paper. You'll need the old version of data set, col_regfile08.dta, for this to work. Eventually we will be uploading the current version of the data, col_regfile09.dta.

In the current version of the paper, we have specification (6) that estimates tetrads on a 30 different sets of reference pairs. For this specification we don't show the actual standard errors, just the standard deviations of the estimates from the 30 regressions and also we show the range from the 10th to the 90th percentile. For these regressions you do not need to CGM standard errors which add a lot of time to the calculation. Hence we use tetrad_areg.ado which employs Stata's areg command. The standard errors are not useful but the coefficients are the same as tetrad.ado would yield.

This figure (deleted from the CEPR version to save space) shows the number of dyads (observations for exporter i and importer j) of positive bilateral trade flows in each year according to the timing of independence. We show four categories of colonial relationships: current colonies (solid lines) as well as former colonies after 1-19 years (long dashes), 20-49 years (shorter dashes), and more than 50 years (dots and dashes) of independence. The main point we draw from this figure is that sample sizes appear large enough to estimate the effects of varying numbers of years since independence. The bump up in trade dyads for current colonies arises because of increases in data availability in 1958 (France begins to report data on its dependencies) and 1960 (newly independent French colonies begin to report). The 1961 jump in dyads that have been independent 1-19 years is followed two decades later by a jump in the number of dyads with 20-49 years, as the African former French colonies "progress" through intervals of independence.

While the problems with estimating conditional-on-positive (COP) regressions are known, using Tobit as a way to integrate the zero trade observations is problematic. Here we show the "bad" COP regressions followed by four different Tobits. The first table shows the tobit coefficients which are marginal effects in terms of the latent (unobserved) variable. Angrist and Pischke argue that the proper object of interest is the marginal effects for the observed dependent variable. These effects are always scaled down by a fixed proportion (related to the share of zeros) so we only show these marginal effects for subset of the regressors. The main takeaway from these tables is small reasonable changes in the method of implementing tobit yield serious changes in the results. Indeed COP lies between these results, suggesting it is not so bad after all.

Graphical methods help by showing what's in the raw data but the challenge is to devise figures that control for other variables. That is, how do you make a two dimensional figure more like multiple regression? In this paper we present two figures that illustrate a method that may have broader applications. We compare trade of two colonies with their respective metropoles relative to a gravity benchmark given by the relative size of the metropole economies. We implemented the method in R for more cases than would fit in the paper so here is a larger set of examples.

The algebra and programming for decomposing independence effects into how they affect trade from the metropole, siblings (which may or may not have exited the empire in the same year as the given country), and rest of world is explained briefly in words in our paper but the full algebra underlying our subsection 4.6 results is shown here.

Figures showing the Metropole-Siblings-ROW effects for amicable and hostile separations.

FDI as an Outcome of the Market for Corporate Control: Theory and Evidence

Head, Keith and John Ries, 2008, Journal of International Economics .

Our paper assumes that distance costs of FDI enter in a linear-in-logs (constant elasticity) formulation. We compare this to the the non-parametric approach that Eaton and Kortum (2002) used for trade flows. They divide bilateral distances into six categories with thresholds (in miles) of 100, 375, 750, 1500, 3000, and 6000.

The results, generated in an R program for the two methods are compared graphically in PDF figures for levels and logs. Confidence intervals for the category dummies are shown at the midpoint of each interval. These depend on the estimated standard error of each distance dummy. The figures also show the implied distance reduction under the linear in logs parametric approach. The midpoint of the first interval is normalized to zero under both approaches. The distance reduction is calculated as -theta ln (d_k / d_1) where d_k are the six midpoints. The figures show that the parametric prediction always lies within the confidence intervals of the corresponding the non-parametric category. In addition, the parametric equation obtains slightly higher R-squared and slightly lower RMSEs.

How remote is the offshoring threat?

Head, Keith, Thierry Mayer, and John Ries, CEPR DP 6542. forthcoming at the European Economic Review

Stata code for the two-step negative binomial QMLE. This specification regresses a non-negative flow variable (in levels, not in logs) on a right hand side specified with an "RHS" macro.

global RHS = "lgdp_o lgdp_d ld lang colony comleg"

glm flow_o $RHS   if flow_o>=0, family(nbinomial 1) robust irls
predict mhat
gen uhatsq = (flow_o-mhat)^2
gen mhatsq = mhat^2
gen y = uhatsq - mhat
reg y mhatsq, nocon
scalar etasq = _b[mhatsq]
drop y mhatsq mhat uhatsq
global a=etasq
glm flow_o $RHS  if flow_o>=0,family(nbinomial $a) robust irls
di "dist effect:" -_b[ld] " standard error:  " _se[ld]  " number of obs:" e(N) "etasq = " $a

Use of the two-step negative binomial QMLE for non-count data may not be a good idea because the estimates depend on the units. Poisson and Gamma QMLE results do not depend on the units of the dependent variable. See `Gravity, log of gravity and the "distance puzzle"' by Clement Bosquet and Herve Boulhol for a critique of using negative binomial QMLE for estimating gravity equations. These authors do not like Gamma QMLE either.

The Puzzling Persistence of the Distance Effect on Bilateral Trade

Head, Keith and Anne-Celia Disdier Review of Economics and Statistics 2008.

Is there an objective criterion one can use for eliminating outliers from a single series? The whole practice of eliminating outliers is frowned upon by many. However, if you have a good reason to do so (we had wanted to graph the data and some of the outliers were so far out the graph would have been useless), it makes sense to follow an objective and replicable procedure: We used the Grubbs test which can be implemented in Stata using this Stata .ado file (Anne-Celia Disdier and I wrote the code. Let me know if you find bugs).