Bayesian methods for large scale small area estimation (SAE)
Submitting Institution
Plymouth UniversityUnit of Assessment
Mathematical SciencesSummary Impact Type
SocietalResearch Subject Area(s)
Mathematical Sciences: Statistics
Economics: Econometrics
Summary of the impact
Small area estimation (SAE) describes the use of Bayesian modelling of
survey and administrative data in order to provide estimates of survey
responses at a much finer level than is possible from the survey alone.
Over the recent past, academic publications have mostly targeted the
development of the methodology for SAE using small-scale examples. Only
predictions on the basis of realistically sized samples have the potential
to impact on governance and our contribution is to fill a niche by
delivering such SAEs on a national scale through the use of a scaling
method. The impact case study concerns the use of these small area
predictions to develop disease-level predictions for some 8,000 GPs in
England and so to produce a funding formula for use in primary care that
has informed the allocation of billions of pounds of NHS money. The value
of the model has been recognised in NHS guidelines. The methodology has
begun to have impact in other areas, including the BIS `Skills for Life'
survey.
Underpinning research
The Statistics Group at Plymouth specialises in Bayesian modelling
methods, with expertise in applying the necessary computationally
intensive methods, using modern High Performance Computing (HPC) where
necessary. Our work on survey methods, spatial statistics, and disclosure
control methods gives us a strong understanding of the sources and
structures of data that underpin our SAE work. Our applied modelling
interests have led to the development of a methodology for a range of
response types, including new frameworks for modelling multinomial
response and multivariate response types by means of latent structure
models, quantile regression methods, and spatial methods. This enables a
wide range of applications in such areas as disease mapping and financial
analysis. One early example of such modelling work was applied to cancer
mapping [Ref.1]. The underlying methodology has continuously been
developed at Plymouth within the Statistics group and has been applied to
other problems. Other established work by statisticians at Plymouth
concerns disclosure control. Administrative data are only released with
limitations on the ability to deduce details about individuals. Hence,
small area predictions require the reconstruction of the multiway tables
underlying a number of two and three way tables that have been released
under the limits for disclosure control. Our works in this area
(particularly [4]) underpin this reconstruction and therefore are key for
the estimation.
Our SAE work is exceptional and distinctive in that:
- We fit relatively complex models to survey data (which may consist of
several thousand person types), and can thus obtain a more sensitive
understanding of the risk surface [1],
- Our understanding of disclosure control methods enables us to derive
usable information on these person types based on released
administrative data [2,3,4],
- We simulate predictive distributions for the risk of disease for these
person types in 6,781 middle-level super-output areas in the whole of
England,
- And we then re-apportion such estimates to some 8,000 GP practices.
[5,6]
This is a computationally demanding task. There is a growing literature
on SAE but, without our ability to apply it to the whole of England, these
methods would not be trusted to inform decisions on a national scale. In
SAE, our work out-performs that of others. The sheer volume of
calculations requires the use of High Performance Computing to throughput
all the calculations. Given fine- grained predictions, we can re-aggregate
to whatever geography is of interest, which currently enables us to study
access to healthcare. In order to produce predictive distributions for
small areas, the necessary data are not easily available for data
protection reasons. For example, we require not only posterior
distributions from survey models but auxiliary data (individual level
data) on the 6,781 middle-level census output areas for the varied
demographic information that matches the census and other administrative
data with the survey data used in the modelling. For disclosure control
reasons, such data are not published. The research leading to impact here
involves the reconstruction of multiway tables for a number of two- and
three-way tables that have been released (with adjustments to limit
disclosure control). The underpinning work in this area (see [4])
therefore is a key part of the estimation process, enabling us to quantify
uncertainty in the auxiliary variables.
References to the research
[1] Hewson P.J. and T.C. Bailey (2010): Modelling multivariate
disease rates with a latent structure mixture model, Statistical
Modelling 10(3): 241-164.
[2] Burridge, J. (2003): Information preserving statistical
obfuscation, Statistics and Computing 13:321-327.
[3] Franconi, L. and Stander, J. (2002): A model based method for
disclosure limitation of business microdata, Journal of the Royal
Statistical Society, Series D 51: 51-61.
[4] Polettini, S., Franconi, L. and Stander, J. (2002): Model
based disclosure protection. In Domingo-Ferrer, J. (Ed.) Inference
Control in Statistical Databases: from Theory to Practice. Berlin:
Springer-Verlag, pp. 83-96. (Peer reviewed book chapter)
[5] Asthana, S., Gibson, A., Bailey, T., Dibben, C., Hewson, P.,
Economou, T., Batchelor, D., Eastham, J., Craig, R., Scholes, S., Flowers,
J., Jenner, D. Person (2008): Based Resource Allocation (PBRA): The
Feasibility of Developing a Need-Based Approach to PBRA. Report to
the Department of Health (Policy Research Programme). University of
Plymouth. 118pp.
[6] S.Asthana, A. Gibson, P. Hewson, T. Bailey, C. Dibben (2011):
General practitioner commissioning consortia and budgetary risk: evidence
from the modelling of fair shares practice budgets for mental health', Journal
of Health Services Research & Policy 16:95-101.
Details of the impact
The research carried out in the area of statistical disclosure control
has led to significant changes in the dissemination of micro data in
official statistics. The seminal paper by Franconi and Stander [3] and
subsequent work in the area of model-based micro-data protection (see [4])
have successfully resulted in the release of micro-data for research
purposes, with respect to the Italian sample of the Structure of
Earning survey. This survey, leading to a linked employer-employee
database harmonized at European level, is now going to reach its third
wave of release. Economists have carried out several studies, using the
micro data file for research distributed at national level by Istat, and
at EU level by Eurostat.
The ideas contained in another paper, published by researchers from
Plymouth University in the field of statistical disclosure control,
Burridge (2003) [2], have opened the field to the use of perturbation
methods showing sound statistical properties by construction. Again, a new
micro data file for research purposes on the system of account will be
released in the near future, stemming from the work carried out at
Plymouth University. Moreover, the IPSO method (Information Preserving
Statistical Obfuscation) has also been implemented in the mu-Argus
software (mu-Argus, 2013), which has been designed to create files of
individual data to be released either for research purposes or for the
public (1). Many national statistical offices in Europe and statistical
agencies around the world use this software. Some of the developments
contained in Polettini and Stander (2005), leading to an alternative
evaluation and a related accurate approximation of the risk of
re-identification, have also been implemented in the mu- Argus software.
Given a good understanding of administrative data systems, we have been
able to contribute to problem solving within the UK. For example, 13.9% of
the total primary care budget is directed to mental health, a total of £8
billion. The indicative allocation of several billion pounds of public
money to GP practices, to provide services for mental health, is informed
by what the NHS refers to as the `Plymouth Model'. In the user guide [page
10](DoH, 2009):
`For mental health, the toolkit includes an entirely new methodology
developed by Plymouth University specifically for practice based
commissioning. This new approach moves away from modelling historic
utilisation and estimates need directly based on different person types'
(2).
In order to do this, we have required state-of-the-art epidemiological
models applied to the Health Survey for England, working on case mix
classifications as well as a statistical reconstruction of the census
data, known to be subject to statistical disclosure control. Quoting the
NHS (DoH, 2009):
`The new methodology has undergone extensive testing by the researchers
and DoH and we believe it provides a step-change improvement in the way we
model mental health need' (2).
This work has been reported to select committees in parliament and
recognised by MPs, all of whom have noted the size of GP commissioning
groups (3). It has also been fully considered by the National Audit
Office, influencing the way public sector funding may evolve in the future
(4). Finally, our expertise with national-scale small-area estimates has
had further impact on work carried out for the 2011 Skills for Life
Survey, which has been made available by the Department for Business,
Innovation and Skills (5).
Sources to corroborate the impact
(1) Evidence of the impact of work on Disclosure control: Mu-Argus
Software http://neon.vb.cbs.nl/casc/..%5Ccasc%5Cmu.htm;
the manual is available at http://neon.vb.cbs.nl/casc/..%5Ccascprivate%5Cdeliverables%5CMUManual4.3.pdf),
(2) Evidence of the impact of the work on funding GPs for mental health
provision:
Department of Health (2009) Practice Based Commissioning: budget guidance
for 2009/10: Methodological changes and toolkit guide DH (see
http://www.dh.gov.uk/prod_consum_dh/groups/dh_digitalassets/documents/digitalasset/dh
094392.pdf
(3) Asthana, S., Gibson, A. (2010), Funding Implications for Rural PCTs
of the new NHS Resource Allocation Methodology, in All Party Parliamentary
Group on Rural Services, The implications of national funding formulae for
rural health and education provision, Report, Written and Oral evidence.
London, House of Commons.
(4) National Audit Office (2011) `Cross-government landscape review
Formula funding of local public services ` REPORT BY THE COMPTROLLER AND
AUDITOR GENERAL
HC 1090 SESSION 2010-2012
(5) Small area estimation applied to skills for life
BIS Research Paper Number 81C (2012) 2011 Skills for Life Survey: Small
Area Estimation Technical Report, Department of Business, Innovation and
Skills