Improving the Management of Uncertainty on the Web: UncertML
Submitting Institution
Aston UniversityUnit of Assessment
Computer Science and InformaticsSummary Impact Type
SocietalResearch Subject Area(s)
Mathematical Sciences: Statistics
Information and Computing Sciences: Computation Theory and Mathematics, Information Systems
Summary of the impact
Aston University researchers developed and maintain the Uncertainty
Markup Language (UncertML) for quantitative specification and
interoperable communication of uncertainty measures in the Web. It is the
only complete mechanism for representation of uncertainty in a web
context. UncertML has been:
- Used in policy and decision making by UK (Food and Environment Research
Agency) and international (European Commission) government agencies, and
many research / industrial institutes;
- Presented at industrial /technical workshops, leading to ongoing
international collaborations with bodies such as national space agencies
(ESA and NASA) and government data providers;
- Accepted as a discussion paper for formal standardisation by the Open
Geospatial Consortium;
- Chosen by independent data providers for efficient sharing of complex
information and rigorous risk analysis across scientific domains such as
pharmacy, global soil mapping and air quality.
Underpinning research
UncertML is an underpinning technology enabling the communication and
management of quantitative information about error, data quality and
uncertainty. It addresses an important challenge in the fast-developing
global array of Web-based services and frameworks which collect, discover
and exchange data and perform modelling on that data — namely, how to
transfer reliably and use the uncertainty information, which is vital for
estimating prediction reliability and risk, in an interoperable way so
that scientists and policymakers from different domains can properly use
it.
During research for the EC-funded INTAMAP project [3.1], Dan Cornford
(Reader, at Aston since 1998), Lucy Bastin (Senior Lecturer, at Aston
since 2003) and Matthew Williams (PhD/postdoc at Aston, 2006-2012), with
initial input from Edzer Pebesma (now at University of Muenster) designed
and developed UncertML, a markup language for describing probabilistic
uncertainty. UncertML was accepted as a discussion paper in 2009 by the
Open Geospatial Consortium [3.5]. UncertML remains the only complete
mechanism for representation of uncertainty in a Web context [3.3].
The work led to the EC-funded UncertWeb project, coordinated by Dan
Cornford. UncertWeb took forward the concept of interoperability into the
"Model Web" domain, where models and data resources exposed as Web
services can be composed to produce complex scientific and decision making
workflows [3.2, 3.5]. UncertWeb developed tools and technology for
managing uncertainty in such a setting, and UncertML is the keystone to
the overall UncertWeb framework. The statistical technology employed was
informed by the research findings of the MUCM and MUCM2 projects (see
section 3). Dan Cornford was the principal MUCM investigator at Aston and
was central to the creation of the Web-based MUCM toolkit which
complements UncertML-based tools developed in UncertWeb.
UncertML is maintained by Dan Cornford and Lucy Bastin. It has research
outcomes and uses as follows (these are later referenced by number in
section 4):
Outcome 1: Easy data exchange in the syntactic web. The UncertML
convention, its concrete encodings in JSON and XML, and Aston's APIs allow
fast and reliable exchange of error and uncertainty data between
applications.
Outcome 2: Links to the developing semantic web. UncertML's
URI-based vocabulary allows linkage between domains and easy extension of
widely-used commercial and open data formats to associate them with
well-defined statistical concepts.
Outcome 3: Enabling interoperability for scientific modellers.
UncertML as a shared convention allows scientists to leverage the power of
re-usable workflows where selected inputs may be changed for fast and
efficient scenario evaluation. It builds on the research findings of
pan-European projects with high-profile end-users, and on links outside
Europe with NASA, NOAA and Australasian researchers, to ensure relevance
of the modelled phenomena and usability of UncertML for a variety of real
scientific problems.
Outcome 4: Enabling interdisciplinary research: the modular,
domain-agnostic design of UncertML allows use across varied disciplines
such as hydrology, biochemistry and ecology, and easy combination with any
other Web standards.
Outcome 5: Supporting robust analysis of risk: the explicit
management of uncertainty allows its propagation through complex analyses
to assess the reliability of predicted results, and clearer communication
of uncertain results to the public and other stakeholders.
References to the research
(the three that best indicate the quality are marked `*'). Citation
counts are from 31/7/2013.
3.1. * Pebesma, E., Cornford, D., Dubois, G., Heuvelink, G. B. M.,
Hristopoulos, D., Pilz, J., Stöhlker, U., Morin, G. and Skoien, J. O.,
2011. INTAMAP: the design and implementation of an interoperable automated
interpolation Web service, Computers and Geosciences, 37, 3, 343-352. (35
citations) DOI: 10.1016/j.cageo.2010.03.019
3.2. Williams, M., Cornford, D. and Bastin, L., 2008. Describing and
Communicating Uncertainty within the Semantic Web, 7th International
Semantic Web Conference, October 2008, Karlsruhe, Germany. (6
citations) URL: http://eprints.aston.ac.uk/10038/
3.5. * Bastin, L., Cornford, D., Jones, R., Heuvelink,G.B.M., Pebesma,
E., Stasch, C., Nativi,S., Mazzetti, P. and Williams, M. (2013) Managing
Uncertainty in Integrated Environmental Modelling: The UncertWeb
framework, Environmental Modelling and Software, 39: 116-134. (13
citations) DOI: 10.1016/j.envsoft.2012.02.008
3.6. Yang, K., Blower, J., Bastin, L., Lush, V., Zabala, A., Maso, J.,
Cornford, D., Diaz, P. & Lumsden, J. (2012). An Integrated View
of Data Quality in Earth Observation. Philosophical transactions of
the Royal Society A, 371 (5 citations) DOI:
10.1098/rsta.2012.0072.
Major grants, including the following, enabled and resulted from this
research; they provide evidence of its quality. As described in section 2,
Aston expertise was crucial to these projects.
INTAMAP project (INTeroperability and Automated MAPping, http://www.intamap.org):
funded by the European Commission under the 6th Framework
programme (1.8m Euro) Ran from September 2006 to August 2009, led by Dr
Edzer Pebesma then at Utrecht University.
UncertWeb project (Uncertainty in the Model Web - http://www.uncertweb.org).
Funded by the European Commission under the 7th Framework
programme (2.5m Euro). Ran from February 2010 to January 2013, led by Dr
Dan Cornford at Aston University. Tools developed in the project are
listed here: https://wiki.aston.ac.uk/foswiki/bin/view/UncertWeb/UncertWebSoftware
MUCM (Managing Uncertainty in Complex Models) and MUCM2 projects (http://mucm.ac.uk).
Funded by Research Councils UK (£3M total). Ran from June 2006 to
September 2012, led by Prof Tony O'Hagan at Sheffield University. Toolkit
available at
http://mucm.aston.ac.uk/toolkit.
GeoViQua project (QUAlity aware VIsualization for the Global Earth
Observation System of systems — http://www.geoviqua.org/).
Funded by the European Commission under the 7th Framework
programme (4m Euro). Runs from February 2011 to January 2014, led by Dr
Joan Maso at the Autonomous University of Barcelona.
Details of the impact
(numbered outcomes are as declared in Section 2. Evidence items
(section 5) cited by letter, e.g., 5A, 5B. References (section 3) cited
as 3.1, 3.2 etc)
Our maintenance of UncertML as a free and open standard with excellent
documentation, supporting toolkits and version control has facilitated its
use by industrial, research and policymaking bodies worldwide to make more
informed use of their data in real-world settings. This open approach is
well established as the best route to impact for markup languages: their
impact depends on widespread adoption by many stakeholders. We have made
strategic use of collaborations (e.g. European projects, research
networks) to promote international usage of the language and tools. The
other key aspect of creating impact is standardisation, so that data can
be shared and systems interoperate. This is being actively pursued. In
March 2013, UncertML was specifically recommended by the US Library of
Congress for digital data collections [5G].
In 2012 the Aston team generated tools for utilising UncertML, including
a Web-based elicitation suite and tools for sensitivity analysis and model
emulation. The tools are already well adopted: e.g. the elicitation tool
alone has over 150 external users across the world, including the Centre
for Workforce Intelligence, US Fish & Wildlife Service, California
Ocean Science Trust, Swaziland National Trust Commission, UK Health
Protection Agency, Sandia National Laboratories (DoE), Australian Customs
and Border Protection Service and the Nuclear Decommissioning Authority
[5D]. (Outcomes: 1, 3, 4, 5)
Policy and decision-making (Outcomes: 1, 3, 4, 5).
The UK Food and Environment Research Agency (FERA) used UncertML and
associated Aston tools in 2011-12 to predict food security in the UK. FERA
created a workflow to assess the likely impact of climate change and
farmer behaviour on wheat production in the UK. This demonstrated that
when uncertainties are properly accounted for, the impact of climate
change on production is impossible to confidently predict using current
data. This important result highlights "the overwhelming risk of
policymaking with data of limited quality and the need for focussed
improvements in model inputs to predict and respond to future variability
in production and its economic effects" (FERA, 2012). This work was
presented in April 2012 to the Chief Scientist within the Department for
the Environment and Rural Affairs (DEFRA), and their intention is to use
similar approaches in policy setting in the future. UncertML specifically
contributes here because the workflow can be easily re-run with data
pulled from real-time sensor networks or alternative climate models with
no loss of statistical detail, allowing multiple scenarios to be tested
with much less investment of time and effort. [5A]
UncertML is integral to the eHabitat tool developed at the European
Commission's Joint Research Centre (Italy), as part of the Digital
Observatory for Protected Areas, which allows stakeholders to monitor and
forecast biodiversity, and presented to the UN Convention for Biological
Diversity. The Hyderabad meeting of October 2012 recognised these tools as
vital in building capacity towards Aichi Biodiversity Target 11, which
aims to safeguard inland water, marine and coastal ecosystems. [5B]
Environmental forecasting, monitoring and protection (Outcomes: 1, 3,
5).
The Norwegian Institute for Air Research (NILU) used UncertML and
associated technologies in 2012 to develop a prototype probabilistic air
quality forecasting system for Oslo, Norway. The system allows NILU to
provide uncertainty estimates on forecasts of two key pollutants affecting
human health, nitrogen oxides and particulate matter. NILU are "exploring
the production of an operational system which will communicate confidence
in the forecasts to decision makers (particularly local authorities), who
may then better inform the public and plan the management of the transport
system" (NILU,June 2013). [5C]
Contribution to international standards (Outcomes: 1, 2, 3, 4).
UncertML was taken to the Open Geospatial Consortium (OGC), the main
international geospatial standards body, as a discussion paper [3.4]. The
online UncertML vocabulary and schemas are accessed by over 200 unique
users per month [5D].
UncertML was adopted within the 2010 GEOSS architectural implementation
pilot 3 (AIP-3) as the sole recommended method for harmonising the
representation of accuracy [3.3]. GEOSS (Global Earth Observation Systems
of Systems) is the global network of content providers allowing decision
makers to access information from observing systems such as satellites and
sensor networks.
This work led to the GeoViQua project, which incorporates uncertainty
into GEOSS and into the GEO (Group on Earth Observations) portal [5C,
3.6]. GeoViQua's UncertML-based schemas were accessed by 378 unique users
last year [5E].
UncertML underlies NetCDF-U, an uncertainty-enabled profile of the
popular NetCDF data transmittal standard which was developed at the
Institute for Atmospheric Pollution, Italy, presented to the OGC's Earth
System Science Domain Working Group and taken up as a discussion paper in
November 2011.
UncertML is actively used since mid-2012 in data encodings for other
scientific domains: e.g., CellML, PharmML and Systems Biology Markup
Language (developed at the European Biological Research Institute) [5F].
International collaboration and data sharing (Outcomes: 1, 2, 3, 4,
5).
UncertML is widely used to define unambiguously uncertainties in Earth
observation. In Europe UncertML was used (2010-present) across several
major projects including ENVISION, NETMAR, EO2Heaven, EnviroFI and SANY
[5H]. These projects represent key developments towards an interoperable
geospatial infrastructure and will have significant long term benefits for
the ways we observe and model our environment.
In Australasia, UncertML is used by government institute Landcare
Research New Zealand in their GSMML data format, proposed as the
international standard for the Global Soil Map [5I] which aggregates data
from 194 countries to generate a vital resource for `better decisions in
global issues like food production, hunger eradication, climate change,
and environmental degradation'.
Sources to corroborate the impact
A. Contact for evidence of FERA impact: Geographical Information
Scientist, The Food and Environment Research Agency, UK.
B. In the COP11 meeting of 2012 (Hyderabad, India) one of the 33
decisions taken specifically references the DOPA tools, which rely on
UncertML for handling of uncertain data, as vital in building capacity
towards Aichi Biodiversity Target 11:
http://www.cbd.int/doc/decisions/COP-11/cop-11-dec-24-en.doc,
section 8.
C. Contact for evidence of NILU impact: Senior Scientist,
Norwegian Institute for Air Research (NILU), Norway.
D. http://uncertml.org/stats/index.html
Access statistics demonstrating that UncertML vocabulary and schema
implementations at http://www.uncertml.org/
are regularly accessed by over 140 unique users per month. A list of
registered elicitator users is also supplied.
E. http://schemas.geoviqua.org/webstats/
Access statistics demonstrating that GeoViQua's UncertML-based
schemas were accessed by 378 unique users in the year to 31/7/2013.
F. SBML (Systems Biology Markup Language) is very widely used in
biological modelling and has adopted UncertML. Contact for evidence of
SBML and PharmML impact: Biomodels Team, European Bioinformatics
Institute (EMBL-EBI), UK.The CellML community also aims to use UncertML —
see the Introduction / Future Work sections of this 2012 PLOS paper:
http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0039721
G. http://www.digitalpreservation.gov/formats/content/gis_intro.shtml
Sustainability of Digital Formats Planning policy document recommending
that UncertML be used alongside established global standards such as FGDC
and ISO in maintaining data collections for the US Library of Congress.
H. http://www.envirofi.eu/Portals/89/Docs/Project/Public_deliverables/ENVIROFI%20D5.3_EN
VIROFI_data_and_meta_information_specifications.pdf (pages 28, 33,
37, 40). How the EnviroFI project uses UncertML to enable the `Future
Internet'. Contact for evidence of NETMAR impact -Senior Earth
Observation Scientist, Plymouth Marine Laboratory, UK.
I. http://adsabs.harvard.edu/abs/2013EGUGA..15.6761R.
Landcare Research New Zealand's GSMML profile, in which UncertML is a key
component. This profile is proposed as the national standard for the
Global Soil Map. Contact for evidence of Landcare impact:
Environmental Information & System Analyst, Landcare Research, New
Zealand.