Impact of Machine-Learning based Visual Analytics
Submitting Institution
Aston UniversityUnit of Assessment
Computer Science and InformaticsSummary Impact Type
TechnologicalResearch Subject Area(s)
Information and Computing Sciences: Artificial Intelligence and Image Processing, Computation Theory and Mathematics, Information Systems
Summary of the impact
Visual analytics is a powerful method for understanding large and complex
datasets that makes information accessible to non-statistically trained
users. The Non-linearity and Complexity Research Group (NCRG) developed
several fundamental algorithms and brought them to users by developing
interactive software tools (e.g. Netlab pattern analysis toolbox in 2002
(more than 40,000 downloads), Data Visualisation and Modelling System
(DVMS) in 2012).
Industrial products. These software tools are used by industrial
partners (Pfizer, Dstl) in their business activities. The algorithms have
been integrated into a commercial tool (p:IGI) used in geochemical
analysis for oil and gas exploration with a 60% share of the worldwide
market.
Improving business performance. As an enabling technology, visual
analytics has played an important role in the data analysis that has led
to the development of new products, such as the Body Volume Index, and the
enhancement of existing products (Wheelright: automated vehicle tyre
pressure measurement).
Impact on practitioners. The software is used to educate and train
skilled people internationally in more than 6 different institutions and
is also used by finance professionals.
Underpinning research
The extraction and visualisation of information from complex datasets —
such as those found in industry and engineering, the health sector and
biological research — is an important area of research internationally.
The NCRG at Aston has become one of the leading international research
groups in machine learning and its application to data analysis and
visualisation.
The NCRG has a strong research record in pattern analysis and machine
learning, especially for feature extraction and visualisation. The group's
research has included major advances in theory and in the developments of
new algorithms. For example, researchers in the NCRG invented and
developed a number of important data visualisation and density models that
are exploited around the world. GTM (Generative Topographic Mapping) [3.1]
was developed by three Aston staff: Bishop (Professor 1993-7), Svensen
(PhD/postdoc 1995-8) and Williams (Lecturer 1995-8) in 1996.
NeuroScale [3.2, 3.3] was developed by two Aston staff: Lowe
(Professor 1993-present) and Tipping (postdoc). GTM was developed
into a hierarchical model in 2002 [3.5] (Tino postdoc 2000-3) and
Nabney (then Senior Lecturer: at Aston 1995-present). These models
project data non-linearly from a high-dimensional space to a lower
(usually two-dimensional) space where it can be plotted. The algorithms
have been shown to provide a significant improvement in performance and
interpretability over conventional algorithms, such as Principal Component
Analysis. The GTM provides a probabilistic model of the data, while
Neuroscale is topographic (distance-preserving).
Building on these concepts, the NCRG has established a unique capability
through the development of new algorithms and the creation of software
tools that implement those algorithms. This work came together in 2002 in
the development of Netlab, a toolbox of open-source Matlab pattern
analysis software that provides a platform for further research,
application development, and technology transfer. The accompanying
textbook [3.4] has been very influential. Netlab contains a wide range of
pattern analysis algorithms, including GTM and Neuroscale, and is freely
available from (www1.aston.ac.uk/eas/research/groups/ncrg/resources/netlab/downloads/).
Over the years, the group has contributed both to advances in the
theoretical underpinnings for data visualisation and has also carried out
research into the practice of visualisation, especially for
non-statistical users [3.6]. This work led to the use of the Netlab
toolkit to build applications in a wide range of domains. Drug discovery
and microarray analysis provided the initial proving grounds for these
visualisation technologies. A generic data visualisation tool, DVMS (Data
Visualisation and Modelling System) was developed with Pfizer in 2006 and
made freely available on the NCRG website (see link above) since March
2012. This tool brings together the novel data projection algorithms
developed in the NCRG with a rich interface that allows the
non-statistically trained user to interactively interrogate and explain
features of the dataset.
References to the research
All citation counts are taken from Google Scholar on 15th
October 2013. The three references that best indicate the quality of the
underpinning research are indicated with a triple asterisk ***.
3.1. *** Bishop, C. M., Svensen, M. and Williams, C. K. I., GTM:
The Generative Topographic Mapping. Neural Computation 10 (1), 215-235,
1996. doi:10.1162/089976698300017953. Evidence for quality: 1009
citations, of which fewer than 20 are self-citations or from Aston
authors. It is still being cited more than 60 times per year. Journal
(5-year impact factor of 2.5) is in the top handful in the field.
3.2. *** Lowe, D. and Tipping, M. E., Feed-forward neural
networks and topographic mappings for exploratory data analysis. Neural
Computing and Applications 4, 83-95, 1996. doi:10.1007/BF01413744.
Evidence for quality: 120 citations. This application-oriented journal was
chosen for its potential to influence practitioners.
3.3. Tipping, M.E. and Lowe, D., Shadow Targets: A Novel Algorithm for
Topographic Projections by Radial Basis Functions, Neurocomputing, 19,
211-222, 1997. doi:10.1016/S0925-2312(97)00066-0. Evidence for quality: 87
citations for this paper and its conference version.
3.4. Nabney, I. T., Netlab — Algorithms for Pattern Recognition. Springer
Verlag, London, 2002. Software available from http://www1.aston.ac.uk/eas/research/groups/ncrg/resources/netlab/
ISBN: 1852334401. Evidence for quality: text book has been through three
reprints, has been cited 918 times, and is still being cited more than 50
times per year. Software has had more than 40,000 downloads since first
being made available.
3.5. *** Tino, P. and Nabney, I. T., Hierarchical GTM:
Constructing localized non-linear projection manifolds in a principled
way. IEEE Transactions on Pattern Analysis and Machine Intelligence 24,
639-656, 2002. doi:10.1109/34.1000238. Evidence for quality: 68 citations.
This is a top international journal in the machine-learning field (5-year
impact factor more than 6) with peer review standards to match.
3.6. Maniyar, D. M. and Nabney, I. T., Visual data mining using
principled projection algorithms and information visualization techniques,
ACM SIGKDD, 643-648, 2006. doi:10.1145/1150402.1150481 Evidence for
quality: 8 citations. Won a best student paper prize at the conference.
Grants collaborative with industry in the census period that have
developed and made use of visualisation techniques based on the research
above: these are evidence of the quality of the underpinning research and
continued activity of the Unit in this domain.
A. Nabney, EPSRC CASE with IGI Ltd. (£81k) 2007-2010. "Visualisation of
geochemical data"
B. Nabney, BBSRC CASE with Pfizer Central Research, (£78k) 2007-2011.
"Improved in silico prediction of molecular properties"
C. Nabney, EPSRC CASE with AgustaWestland (£82k) 2011 - present.
"Diagnostics and inference for helicopter vibration data"
D. Nabney, Centre for Defence Enterprise with Daden (£32k) 2011. "Use of
virtual immersive worlds for data visualisation"
E. Nabney, short KTP with Lein Diagnostics (£20k) 2012. "Improved glucose
monitoring for diabetics"
F. Nabney, WheelRight Ltd. (£34k) 2012 - present. "Pattern analysis to
improve in-road measurement of tyre pressure"
G. Lowe, Centre for Defence Enterprise (Dstl), (£24k) 2011. "Multi-Source
Intelligence: Challenge 2: Probabilistic Visualisation Maps"
H. Lowe, EPSRC Industrial CASE with Thales (£90k) 2011 - present.
"Topographic Information Visualisation"
Details of the impact
Process
Visual analytics is a powerful and generic technology to provide insight
into data: thus we have sought to create impact in a wide range of
domains. The process for creating impact started with the implementation
of a well-engineered software toolbox (Netlab) that is reusable, can be
deployed quickly, and forms the underpinning technology for larger
systems. Netlab supported the creation of user-centred tools DVMS
(originally for Pfizer), milva (gene expression array analysis), et al.
DVMS in particular has been a key step in creating impact since it has
enabled researchers at Aston to deploy visualisation methods on a wide
range of projects (subsections 4.2 and 4.3) and work directly with end
users (who do not have statistical training) to understand the data and
thus improve the accuracy of data analysis. Our standard contracts allow
the reuse of generated IP in non-competing domains, which has enabled us
to build the capability of our visual analytics tools across multiple
projects for different companies. Thus high-quality data visualisation is
a fundamental part of the data modelling process on many industrial
projects carried out by the NCRG.
The dissemination strategy has had impact at its heart from the
beginning. Papers were published in both scientific and application
journals; inter-disciplinary conferences (e.g. SIGKDD) and user/industrial
workshops (e.g. Natural Computing Applications Forum, Chemometrics SIG)
have been targeted; publicly available open-source software and industrial
training courses (designed in collaboration with industry) increased the
impact on practitioners.
Applications where visual analytics is particularly useful can be
characterised by large quantities of uncertain information where there is
a need for non-statistically trained users to interpret data and
understand its analysis, and so organisations with this sort of data have
been targeted.
Relationships with companies have been formed through personal networks
(e.g. Pfizer, IGI, Thales) and contacts made with Aston's Business
Partnership Unit (e.g. WheelRight, Daden, AgustaWestland, Lein): a mixture
of special events targeted at companies with appropriate data and
company-led interactions (often inspired by the high visibility of the
NCRG and Netlab).
1. Industrial Products
A number of companies have incorporated the data visualisation work of
the NCRG directly into their products.
- Pfizer Central Research has supported our work since 2000 through
funding (2 directly funded PhDs and 1 CASE studentship; BBSRC funded a
postdoc) the development of practical visualisation algorithms and
methods [5.1] (and has collaborated in writing joint papers). This
culminated in the development (2008) of an interactive visualisation
tool used by Pfizer's chemists and biologists (rather than
statisticians) to interpret and analyse screening results (e.g.
biological activity, toxicity etc).
- Integrated Geochemical Interpretation Ltd, a petroleum geochemistry
consultancy company which operates world-wide, sells p:IGI, a software
product for oil and gas exploration geochemistry with a 60% share of the
worldwide market. IGI co-funded a PhD CASE student [5.2] who, after
completing his thesis in 2010, worked for the company to implement our
visualisation algorithms to enhance the p:IGI tool [5.7]. The company
started a £500k TSB grant in 2013 (jointly with Aston and Daden, see
also below) to drive further development of this product and enable them
to expand into new business sectors.
- Intelligence data visualisation: A prototype information visualisation
system was delivered to Dstl (Defence Science and Technology Laboratory)
in 2011 under a contract [3.G, 5.4] on Collaborative Multi-source
Intelligence related to integrating spatio-temporal and network analysis
incorporating measures of uncertainty.
- Thales is now exploring the use of topographic information
visualisation using very high- dimensional submarine sonar array data as
part of a collaborative industrial CASE project. The idea is to
synthesise the vast amounts of information into a form suitable for
human interpretation by skilled operators on board submarines.
2. Improving business performance
Visual analytics and the DVMS software have played a fundamental role in
collaborative industrial research projects in order to understand data,
detect outliers and select important features. This improved analysis has
led to the development of new and improved products.
We worked (2010-1) with Daden, an SME based at Birmingham Science Park
Aston, on a £35k contract for Dstl (the Cyber and Influence Centre, part
of the Centre for Defence Enterprise) for immersive data visualisation and
its application to defence intelligence analysis. The results of this
research formed part of the case for a significant corporate investment in
Daden from BAE Systems and the development of the Datascape product [5.6]
in 2012-3.
Nabney consulted for Select Research Ltd. [5.3] to develop the
data visualisation and analysis for the Body Volume Index (BVI), a novel
obesity metric replacing BMI, based on volumetric measures with a
white-light scanner. This metric is much better correlated with biomarkers
of vascular disease than BMI. Select launched BVI in November 2010 and
have been in trials for the NHS.
In a CASE studentship with AgustaWestland [3.C] to develop a condition
monitoring system for helicopter airframes, DVMS has been used to select
frequency bands and sensors that will provide the best detection of
faults. The visual nature of the results has enabled us to explain the
selection process to the engineers and demonstrate the transitions between
different flight modes. The results of this are a prototype for detecting
the need for structural maintenance on the airframe.
WheelRight [5.5] makes sensors for measuring vehicle tyre pressure which
are embedded in the road so that measurement is performed simply by
driving over the sensors and the results are sent by text to the driver.
Correct tyre pressure maintenance is done poorly and irregularly.
WheelRight's goal is to make this process automatic, specific, accurate
and easy to use. A study by Bridgestone Tyres estimates tyre
under-inflation is costing the EU €2.8 billion each year in wasted fuel
and adding 4.8 million tons of CO2 emissions. This high cost is
due to drivers' unwillingness to use the current tools. There is now a
better way to radically reduce this waste.
In a directly-funded research contract (2012-3) we have significantly
improved the results from Wheelright's vehicle tyre pressure measurement
instrument. We used visualisation to understand the complex data signals
and then developed new predictive models that significantly improved the
measurement accuracy, particularly on HGVs and buses, so that it now meets
the requirements of their customers. WheelRight have installed three
complete systems used each day by a range of vehicle types. At a municipal
bus depot with a fleet of 80 buses over 2,000 tyre pressures are taken
each week, at a HGV fleet operator this increases to over 14,000 and at
the entrance to a technology site over 10,000 are recorded. In total the
company has automatically recorded over a million tyre pressures in a
twelve-month period in 2012-3 from these three sites alone. Taking this
number of manual tyre pressures would be impractical and very expensive;
estimated labour costs would exceed £500k p.a. The bus operator is
achieving cost savings from lower fuel consumption, longer tyre life and
labour exceeding £50,000 p.a. The system has also undergone tests in 2013
by the National Measurement Office to confirm the system's accuracy for
all vehicle types.
Building on the analysis of clinical trial data (2012) by Aston
researchers using visualisation techniques, Lein Diagnostics has developed
an optical measurement device for blood glucose. In 2013, Lein has
received further support (£94k) from the NHS to develop an improved
version of the meter. The aim is to develop a meter that will achieve the
measurement accuracy specified by the ISO and FDA in order to displace the
widely disliked finger-stick meters used by diabetics.
3. Impact on practitioners
The primary means of impact on practitioners is the Netlab toolbox. So
far there have been more than 40,000 downloads of Netlab by the academic
and business users worldwide since its release. Currently, it is averaging
6,200 page-views per year (thus c. 34,000 over the REF census period) and
3,363 downloads per year (thus c. 18,500 over the REF census period).
A portfolio manager at Tudor Capital Europe LLP (who carried out a PhD at
Aston in data visualisation) in charge of a large macroeconomics trading
portfolio uses data visualisation and dimensionality reduction methods
(particularly GTM) as a key tool in his portfolio management strategy. He
uses visualisation to interpret the very large number of macroeconomic
indicators which often helps him identify new macro trading themes earlier
than most market participants.
Several universities also use the Netlab software toolkit and associated
book [3.4] in pattern recognition courses, including: Edinburgh;
Southampton; Portland State University; the University of California, San
Diego; the University of Warsaw; and the Czech Technical University in
Prague. This means that training in these data visualisation algorithms is
not confined to Aston University, thus increasing the supply of
highly-skilled practitioners.
Sources to corroborate the impact
5.1. Pfizer Central Research: Hub Submissions Manager
5.2. IGI Ltd.: Geochemical Consultancy Manager
5.3. Select Research Ltd.: Managing Director
5.4. Dstl: Fellow in Informatics.
5.5. WheelRight: Chief Executive.
5.6. Datascape http://www.daden.co.uk/solutions/datascape/
5.7. p:IGI http://www.igiltd.com/pigi-3.html