Enabling the econometric analysis of policy interventions

Submitting Institution

Heriot-Watt University

Unit of Assessment

Business and Management Studies

Summary Impact Type


Research Subject Area(s)

Mathematical Sciences: Statistics
Economics: Applied Economics, Econometrics

Download original


Summary of the impact

The research created programs for the Stata statistical software environment that are used by thousands of researchers in economics and other fields around the world in academia, the private sector, government and quasi-governmental organisations, with approximately 400,000 downloads in the REF 2014 period. The core programs enable researchers to rigorously analyse the causal impact of a policy in settings where an experiment is infeasible and for experiments where take-up of treatment is incomplete, i.e. for the settings in which the vast majority of empirical work is done. The programmes are used to analyse complex data to establish causal links across a broad range of policy areas.

Underpinning research

Research in the area of econometrics in the past 30 years has moved very quickly. Many advances have proven to be very useful for applied research. An important obstacle to wide adoption of these techniques has been the availability of new estimation and testing methods in commercially-available software packages. Such techniques are typically not introduced by commercial providers until the technique is generally accepted. Implementation of these sophisticated techniques in a software environment is, moreover, a challenging exercise that itself requires a substantial research investment.

Instrumental variables (IV) estimators have followed this pattern. They are widely used by economists and other empirical researchers when the econometric problem of endogeneity is present. Endogeneity means that standard methods of estimating the causal impact of a policy cannot be used because the policy or "treatment" is non-random. The importance of the problem, and the centrality of IV methods in addressing them, is reflected in the substantial coverage given to IV estimation and its generalisation, the Generalised Method of Moments (GMM), in undergraduate and especially graduate econometrics textbooks. But because developments in the theory of IV estimation have been rapid, availability of cutting-edge methods has been patchy, and many recent advances are simply unavailable to empirical researchers who wish to make use of them.

Starting in 2003, Schaffer, working primarily in collaboration with Christopher Baum (Boston College) and also with Steven Stillman (University of Otago, New Zealand) and Frank Kleibergen (Brown University) amongst others, developed and implemented a comprehensive set of IV/GMM estimation and testing procedures for the Stata statistical software package. These routines include single-equation and panel data IV/GMM estimation, robust covariance estimation (heteroskedastic, autocorrelation and 1- and 2-way cluster-robust), specification tests for IV estimation (heteroskedasticity, autocorrelation, RESET, overidentification, endogeneity, weak identification), and weak-instrument-robust inference. These procedures brought the range of methods available to researchers working in the Stata environment to the frontier of econometric "best-practice". The capabilities of the software and the range of packages haven been and continue to be regularly extended; recent examples are the facilities for weak-instrument-robust inference, the introduction of 2-way cluster-robust covariance estimation (a technique that was published in the econometric literature only several years ago) and IV estimation using heteroskedasticity-based instruments (the core theory for which was published only in 2012).

References to the research

[1] Schaffer, M. E., Baum, C., Finlay, K., Kleibergen, F., Magnusson, L. & Stillman, S. 2013 Stata software for econometric estimation and testing; avar, weakiv, actest, ivreg2h, ranktest, ivreg2

[2] Christopher F Baum & Mark E. Schaffer & Steven Stillman, 2003. "Instrumental variables and GMM: Estimation and testing," Stata Journal, StataCorp LP, vol. 3(1), pages 1-31, March. Available on request

[3] Christopher F Baum & Mark E. Schaffer & Steven Stillman, 2007. "Enhanced Routines for Instrumental Variables/Generalized Method of Moments Estimation And Testing," Stata Journal, StataCorp LP, vol. 7(4), pages 465-506, December. Available on request.

Details of the impact

Stata is a statistical package used worldwide with a large user community. An important feature of Stata is that StataCorp supports integration of user-written software. This has meant that the programs and supporting documentation developed by Schaffer et al. can be and have been easily found and installed by users.

Take-up of the routines by Schaffer et al. has been very substantial. The software is stored on a service run by Boston College, "Statistical Software Components" (SSC), in collaboration with RePEc (Research Papers in Economics, www.repec.org). SSC tracks all software downloads/installations; these details are available via the SSC server. (NB: downloads reported on the RePEc website include only those installations done "by hand" rather than automatically by the Stata package, and understate the number of installations by roughly a factor of 8-10.) As of August 2013, the software described above had been downloaded 470,000 times, with over 400,000 of these downloads during the REF period [S5]. The most widely-used package is the core estimation routine, "ivreg2", downloaded over 130,000 times during the REF period. These downloads include both first-time installations and downloads by existing users of updates to routines.

The programs written by Schaffer et al. are now well established in non-academic institutions, particularly in governmental and quasi-governmental organisations in the US, as well as in academia. A partial list of institutions where use has been documented:
Federal Reserve Board
International Monetary Fund (IMF)
Mathematica Policy Research, Inc.
RAND Corporation
Urban Institute (Washington, DC)
US Census Bureau
US Department of Agriculture (USDA)
US Federal Trade Commission (FTC)
US Government Accountability Office (US GAO)
World Bank

A Lead Economist in the World Bank's Research Department explained how he uses the programs: "In my work where I have to deal either with very large cross-sections, or less frequently, with panel data, several softwares developed by Mark were absolutely indispensable. ... [Our] paper investigated prevalence of corruption in transition economies and argued that it should be less if there were more frequent changes in government (alternation in government). We faced a tough problem of how to find instruments for alternation in power because corruption could in turn affect alternation (ruling party losing election because it is corrupted) and thus the causation would run the other way round. In (another) paper, the problem was similar: here I looked at decile income shares in practically all countries of the world (ten regressions per each country; one per decile) and tried to determine how they were affected by trade/GDP ratios. I used GMM-IV approach, again as developed by Mark. I also used the same software in a paper written with [a colleague] that looked at decile shares in transition countries and the effect that privatization (among other variables) had on them" [S2]

A Senior Research Associate from the Urban Institute described the extensive use they make of the software. "We have completed hundreds of studies where our analytical methods have been significantly affected by your work. The software has been used to analyze the impact of mental health treatment on work, and the impact of housing vouchers on family economic status and health. The current frontier of policy analysis quite simply would be far inside its current boundaries without the development of the IV regression technology at Heriot-Watt University." [S3]

Sources to corroborate the impact

[S1] Professor, Department of Economics, Boston College will confirm that the software is stored on a service run by Boston College, "Statistical Software Components" (SSC), in collaboration with RePEc (Research Papers in Economics, www.repec.org ). SSC tracks all software downloads/installations; these details are available via the SSC server.

[S2] Lead Economist, Development Research Group, World Bank will describe the application of the softwares in a broad range of complex policy environments with very large cross-sections, or with panel data. The softwares make the difficult analysis possible with a degree of confidence.

[S3] Senior Research Associate, Urban Institute will confirm that they have completed hundreds of studies where analytical methods have been significantly affected by the work. The software has been used to analyze the impact of mental health treatment on work, impact of housing vouchers on family economic status and health.

[S4] Senior Survey Statistician, Abt SRBI http://srbi.com The senior survey statistician will confirm the statistics cited.

[S5] Statistics for verification:

Summary statistics of Stata downloads are regularly announced on the Statalist discussion site, e.g., http://www.stata.com/statalist/archive/2013-01/msg00114.html

Raw data on all downloads can be obtained in Stata format files for January 2008 onwards from the RePEc website. URLs are http://repec.org/docs/sschotPXXX.dta, where XXX=576 for January 2008, 577 for February 2008, 642 for July 2013.

(Note: the download statistics cited above include all downloads, including those from within the Stata software environment using the "ssc install" and "adopupdate" commands.
Download statistics listed on the LogEc website of RePEc
(http://logec.repec.org/scripts/itemstat.pf?type=redif-software) cover only those download done "by hand" and do not include those from within the Stata software environment.)

[S6] http://www.rand.org/pubs/monographs/MG829.html Nancy Nicosia, Rosalie Liccardo Pacula, Beau Kilmer, Russell Lundberg, James Chiesa (2009), " The Economic Cost of Methamphetamine Use in the United States, 2005", Rand Drug Policy Research Center. "This study was sponsored by the Meth Project Foundation and the National Institute on Drug Abuse and was conducted under the auspices of the Drug Policy Research Center, a joint endeavour of RAND Infrastructure, Safety, and Environment and RAND Health." Citation on p. 112: "All of these instruments were assessed in terms of their validity, exogeneity, and exclusion criteria using linear probability models estimated using ivreg2 in Stata 8.2."