Impact on the Statistical Confidentiality Practices of Data Stewardship Organisations
Submitting Institution
University of ManchesterUnit of Assessment
SociologySummary Impact Type
SocietalResearch Subject Area(s)
Mathematical Sciences: Statistics
Economics: Applied Economics
Summary of the impact
Research at the University of Manchester (UoM) has developed new
approaches, methods and algorithms to improve the statistical
confidentiality practices of data stewardship organisations (DSOs), such
as the UK's Office for National Statistics. The research and its products
have had significant impacts on data dissemination practice, both in the
UK and internationally, and have been adopted by national statistical
agencies, government departments and private companies. The primary
beneficiaries of this work are DSOs, who are able to both disseminate
useful data products, and protect respondent confidentiality more
effectively. Secondary beneficiaries are respondents, whose
confidentiality is better protected, and the research community, as
without `gold standard' disclosure risk analysis, data holders can be
overcautious.
Underpinning research
The underpinning research on disclosure risk analysis was carried out at
the Cathie Marsh Centre for Census and Survey Research (CCSR) at UoM
(1996-); led by Dr Mark Elliot (Senior Lecturer, School of Social
Sciences). The Manchester team also includes: Dr Kingsley Purdam (Research
Fellow, 2003-), Dr Elaine Mackey (Research Associate, 2010-), Dr Duncan
Smith (2003-2008, now Honorary Research Fellow), and Susan Lomax (Research
Assistant, 2006-2009).
`Statistical disclosure' occurs when, through statistical matching, a
population unit is identified within an anonymised dataset and/or new
information about them is revealed. This compromises the privacy of the
individuals within the data, breaks data protection legislation and
results in reputational and other damage for those responsible for the
data. Statistical Disclosure Control (SDC) concerns the prevention of such
events. Research at UoM has helped to remodel the way that disclosure
risk analysis is carried out, delivering innovations in both statistical
methods, and in the whole framing of the problem.
The distinctive nature of the research contribution flows from the
insight that the SDC field was data-centric (focusing on the
statistical properties of the data to be disseminated) when it needed to
be intruder-centric, focusing on the means, motives and
opportunities (MMO) of individuals or organisations who might wish to
attack the data. This led to a body of work developing the notion of
scenario analysis [E]. By creating pragmatically based MMO
models for any given dataset it is possible to look at what the intruder
would be trying to achieve, and then generate metrics which measure the
risk of them achieving it. This framework is now widely regarded by
DSOs as an important part of disclosure risk assessment. This
scenario-based framework led in turn to significant methodological
innovations:
-
Development of the Data Intrusion Simulation (DIS) measure of risk
for microdata [D] and of SUIM (the Special Unique
Identification Method) [C]. The two metrics were embodied in
software called SUDA, developed in collaboration with colleagues
in computer science, with EPSRC funding. The software has been adapted
for use by several statistical agencies in the UK, the US, Australia,
New Zealand and Singapore, in determining outputs from census and
surveys.
- Parallel to this, the research has extended the notion of scenario
analysis [E]. This extended framework proposes that disclosure
risk arises from the relationship between the data and its environment
(taking into account the additional, external information that might be
available to a would-be intruder), raising the issue of how to
capture/measure the environmental facet of the risk. This led to the
framework of Data Environment Analysis (DEA), a set of methods for
measuring the environmental aspects of the risk including `key variable
mapping' [A]. Until this work, assessment of disclosure risk was only
measuring one side of the problem. The new framework now enables these
additional risks to be formally principled. On the basis of this work,
the Office for National Statistics (ONS) commissioned and funded the
Data Environment Analysis Service, (DEAS, 2008-2012 at UoM), to guide
the specifications of the 2011 Census outputs (2012 Annual Report,
available upon request).
References to the research
(all references available upon request)
[A] (2011) Elliot, M., Lomax, S., Mackey, E. & Purdam, K. "Data
Environment Analysis and the Key Variable Mapping System" in
Domingo-Ferrer, J. & Magkos, E. (eds.) Privacy in Statistical
Databases (Springer: Berlin)
[B] (2008) Smith, D. & Elliot, M. "A Measure of Disclosure Risk for
Tables of Counts" Transactions in Data Privacy 1(1) 34-52 (REF
2014)
[C] (2002) Elliot, M. J., Manning A. M. & Ford, R. W. "A
Computational Algorithm for Handling the Special Uniques Problem'. International
Journal of Uncertainty, Fuzziness and Knowledge Based Systems 5(10)
493-509
[D] (2002) Skinner, C. J. & Elliot, M. J. "A Measure of Disclosure
Risk for Microdata" Journal of the Royal Statistical Society: Series
B (Statistical Methodology) 64(4) 855-867 (RAE 2008)
[E] (1999) Elliot M. & Dale, A. "Scenarios of Attack: The Data
Intruder's Perspective on Statistical Disclosure Risk" Netherlands
Official Statistics (Spring) 6-10
Grants: Over the past 13 years, Elliot has been involved in over
£1.58M (Manchester Share) of project funding, from: ESRC, EPSRC, MRC, EU
and ONS. Key grants include:
• (2013-18) Administrative Data Service (£6.18m; Manchester £518K,
Co-Director)
• (2008-13) ESRC: `Administrative Data Liaison Service' (£850K;
Manchester £100K, Co-I)
• (2012-13) Office for National Statistics (ONS): EUL->OGL (£30K, PI)
• (2005-8) MRC: `Clinical E-science Framework II' (£4.5m. Manchester
£2.55m, Co-I)
• (2003-7) EPSRC: `High Performance Computing and Statistical Disclosure
Control' (£260K, Co-I)
• (2001-3) EU: `Computational Aspects of Statistical Confidentiality'
(Total ~2.2M ECU; Manchester 172K ECU, Partner)
Details of the impact
1. The key impact of this body of research is the reduction of likelihood
of personal information disclosures from research data. The immediate
beneficiaries are national and international data stewardship
organisations (DSOs), who have responsibility for disseminating data
whilst at the same time maintaining confidentiality. The reach of this
impact has increased in recent years from national statistical agencies
(UK and overseas) to include government departments — such as the
Department for Work and Pensions (DWP) and the Department for Communities
and Local Government (DCLG) who are under increasing pressure to release
administrative data as part of a wider `open data' agenda.
Impact has been delivered through collaborative research partnerships
with practitioners and professional services in data stewardship.
This research, consultancy and advisory work in relation to data
management and confidentiality has helped to shape professional best
practice in the field. The body of work described here is impact-embedded
research. An example of this is [B], which describes an algorithm
developed whilst analysing the risk inherent in the Neighbourhood
Statistics — work carried out for ONS several years earlier. Overall,
impact has been delivered through a variety of pathways:
Impact through DSOs adopting methods into their practice. The
Canadian firm Privacy Analytics Inc. has utilised [D]: "as part of
the general heuristic for re-identification risk assessment in our
software, [with Elliot's] work on scenarios of attack (1999)
and SAR data (2001, 2007) helping shape the methodology that is being
used by the company. Privacy Analytics, Inc.'s solutions have been
leveraged by approximately 70 organizations in the US and Canada to
control their disclosure risk, including: the American Society of
Clinical Oncology, Heritage Provider Network, Icahn School of Medicine
at Mount Sinai, Department of Preventative Medicine, State of
Louisiana Department of Health and Hospitals, Alberta Healthcare,
Cancer Care Ontario, Children's Hospital of Eastern Ontario, College
of Family Physicians of Canada, Public Health Agency of Canada, TELUS
Health, and Vancouver Costal Health" [1]. Likewise, after reading
articles and presentations of the research at meetings, the Australian
Bureau of Statistics noted that: "Your work helped shape ABS's
current thinking on confidentiality protection for microdata...
[and has] assisted the ABS to produce high quality statistics while
maintaining provider trust" [2]. Specifically on [D]: Statistics
Canada say: "The Skinner-Elliot measure, adapted for survey weights
by Skinner & Carter, is often used by our surveys to provide an
indication of the risk inherent in certain combinations of their
variables. The measure has been incorporated in our in-house
disclosure risk generalized program" [3].
On documents [C] and [E] ONS say: "Previous work carried out by Dr
Elliot continues to have an impact on the way we conduct our
assessment of risk at ONS [and] ONS occasionally use the
Skinner and Elliot (2002) metric to assess microdata releases which
come to us for approval via our Microdata Release Panel and in
particular a significant contribution consistently used is Dr Elliot's
concept of special uniqueness involving the identification of uniques
that stand out regardless of geography. Many of his research ideas
have been fundamental in informing GSS Disclosure Control Policy for
Social Survey Microdata" [4]. Further afield, the US Census Bureau
contends that the research has "had a significant impact on the way
in which disclosure control is carried out in our organisation",
and that they have adopted the concept of special uniques in their
reidentification studies which: "has helped us to identify public use
microdata records that are at risk of disclosure. We are then able to
slightly distort those records so that we can release them while
preserving the statistical properties of the data" [5].
2. Based on the principles developed in [A] and [E], Elliot set up the
Data Environment Analysis Service (DEAS) with funding from ONS, to
guide the specifications of the 2011 Census outputs. This informed ONS's
own disclosure risk analyses and policy, with their Head of Statistical
Disclosure confirming that it: "greatly assisted ONS in performing
day-to-day disclosure risk assessments of our microdata and knowing
under what conditions they should be released. It also informed the
specification of the Sample of Anonymised Records (microdata from the
2011 Census)" [4].
3. The provision of consultancy reports to DSOs, evaluating the
disclosure risks from specific datasets (to be disseminated) or
analysing the risk impact of particular policy decisions.
Commissioned on the basis of the expertise and track record of Dr Elliot
and his team, these evaluations (`Report on the Disclosure Risk
associated with Fire and Rescue Incident Datasets', `Report on the
Disclosure Risk associated with Supporting People Datasets', `Disclosure
Risk Audit of the Smart Steps Data Service') directly apply the research
reported above. For example, the report to Telefonica
(`Disclosure Risk Audit of the Smart Steps Data Service') highlighted
the potential risks of providing information at multiple units of
geography, and the need to suppress small counts in remote areas to
ensure individuals cannot be identified through reverse engineering [6].
These reports have enabled DSOs to make better decisions about the data
in question, resulting in safe and usable data. For example, DCLG
confirms that: "[the] work showed that the database which I provided
was far from anonymised in practice" and in turn this:
"...showed that we needed to proceed with access to the database in a
secure setting and/or under data access conditions" [7].
Similarly, ONS note that: "The outcome of [`EUL to OGD: A Risk
Analysis'] will have significant implications on the way ONS will
provide data to its customers in the future, and helps ONS to meet
government strategic aims in proactively supporting the Open Data
Agenda" [4].
4. The provision of dedicated SUDA software to DSOs to help them
carry out of their own disclosure risk analyses [8]. Several DSOs, such
as the Australian Bureau of Statistics and the Singapore Department of
Statistics, have incorporated the software into their data dissemination
practice. For example, the Australian Bureau of Statistics confirms
that: "Since 2007 SUDA has been a part of the standard assessment
process for all CURFS [Confidentialised Unit Record Files]
released by the ABS. In particular SUDA is used to help identify risky
records. Generally, as a result of the SUDA analysis, a number of
records on each CURF are given additional confidentiality treatment to
mitigate the identification risk" [2].
5. The provision of certificated safe researcher training
delivered via the Administrative Data Liaison Service (ADLS), a service
set up to support research using administrative data (www.adls.ac.uk).
The primary goal of the training is to educate participants in dealing
with data safely, and is attended by DSO staff as well as researchers.
The course is part of the Safe Researcher Training Programme and is
endorsed by major administrative data holding organisations in the UK
and the Information Commissioner's Office. By the end of 2013 this
course will have run 14 times, each attended by 20-50 participants. The
research is embedded throughout the course and is the basis of the most
important module of the course. The course is an approved method for
access to administrative data held by the Scottish government and NHS
Scotland.
6. The development of the UK Anonymisation Network (UKAN), set
up to establish best practice in anonymisation. Led by Elliot and
funded by the UK Information Commissioner, UKAN offers practical advice
and information to anyone who handles personal data and needs to share
it (www.ukanon.net). The network
provides a range of resources to private and public sector data holders
in the UK, including anonymisation clinics (where individual
DSOs attend to receive guidance on specific data anonymisation
problems), and a web site providing case studies of disclosure
control in practice, including several derived from UoM research.
In a joint press release, the UK Information Commissioner, said: "The
work of UKAN will help build on the recommendations laid down in the
ICO's data protection code of practice on managing the risks related
to anonymisation which we published last year" [9].
The information economy requires the free flow of data, but privacy
breaches lead to a breakdown of trust, respondent non-co-operation and DSO
over-cautiousness. Armed with the tools and methods developed from this
research, DSOs can proceed with greater confidence, and make more informed
data release decisions. As ONS state: "The long-term impact of the
reports above and ongoing collaborative work with Dr Elliot has meant
that ONS can demonstrate a rigorous and considered approach to data
dissemination given a fuller understanding of the potential disclosure
risk. This helps ONS to achieve its vision to produce high quality
trusted statistics that meet user needs." [4] Similarly, the US
Census Bureau agrees that: "The broader impact of this work is to help
ensure that we can disseminate high quality data products whilst
maintaining the confidentiality of respondents" [5].
In sum, without this work delivery of the open data agenda would be
significantly hampered.
Sources to corroborate the impact
(all claims referenced in the text)
[1] Testimonial from Chief Executive Officer, Privacy Analytics, Inc. (29th
August 2013)
[2] Testimonial from Chief Methodologist, Australian Bureau of Statistics
(27th February 2013)
[3] (2013) Report and Covering Document from Statistics Canada (18th
February)
[4] Testimonial from Head of Statistical Disclosure Control Branch,
Office for National Statistics (10th April 2013) & (2013)
Elliot, M. & Mackey, E. `EUL to OGD: A Risk Analysis (phase 1) Report'
(August)
[5] Testimonial from Chair, Disclosure Review Board, US Census Bureau (6th
June 2013)
[6] (2013) Elliot, M. `Disclosure Risk Audit of the Smart Steps Data
Service For Telefonica'
[7] Testimonial from Head of Fire Statistics, Department of Communities
and Local Government (31st May 2013)
[8] (2004) CAPRI/UoM `Special Uniques Detection Algorithm User Guide'
(September)
[9] ICO/UKAN Press Release