Data maps with applications to medical diagnostics and monitoring
Submitting Institution
University of LeicesterUnit of Assessment
Mathematical SciencesSummary Impact Type
TechnologicalResearch Subject Area(s)
Mathematical Sciences: Pure Mathematics, Statistics
Information and Computing Sciences: Computation Theory and Mathematics
Summary of the impact
Advanced technologies for data visualisation and data mining, developed
in the Unit in collaboration with national and international teams, are
widely applied for development of medical services. In particular, a
system for canine lymphoma diagnosis and monitoring developed with [text
removed for publication] has now been successfully tested using clinical
data from several veterinary clinics. The risk maps produced by our
technology provide early diagnosis of lymphoma several weeks before the
clinical symptoms develop. [text removed for publication] has estimated
the treatment test, named [text removed for publication], developed with
the Unit to add [text removed for publication] to the value of their
business. Institute Curie (Paris), applies this data mapping technique and
the software that has been developed jointly with Leicester in clinical
projects.
Underpinning research
The problems related to large data set analysis and visualisation,
model reduction and the struggle with complexity of data sets are
important for many areas of human activity. The identification of hidden
geometry and topology in noisy data sets is a challenging task. Many
branches of data analysis aim to solve such problems under some additional
assumptions that simplify the problem. However, the verification of these
assumptions may be more complicated than the solutions of the problems. A
universal technology for uncovering the hidden structure is very
desirable. An answer to this challenge cannot be simple because it must
potentially cover the majority of situations.
In the 1990s Levesley and Light produced theoretical results concerning
the approximation power of neural networks. This work led to Levesley's
involvement in a simple neural network model for the prediction of acute
rejection of kidney transplants together with pathologists from the
University of Leicester [3.7]. The Unit recognised the potential
for impact of research in this area, leading to the development of a team
under the leadership of Gorban with more specific expertise in the theory
and practical application of neural networks.
In summary, we have developed a universal technology for revealing and
visualising the hidden structure in data. For this purpose, we have used
ideas both old and new:
- The oldest of them is the idea of self-consistency introduced by H.
Steinhaus in 1957 (k-means) and then recognized as a very general and
productive idea that can be used for construction of many principal
objects like principal manifolds and principal graphs (Husty at al,
1984). This idea is an intrinsic part of the self-organizing maps (SOM)
and many data approximation approaches also.
- The application of quadratic elastic energy functionals is a basic
idea in spline approximation and is used by us for construction of
principal manifolds, in the elastic maps technology [3.6].
- Gorban and Zinovyev (Curie) developed the topological grammars
approach for data analysis [3.3] based on the idea of graph
grammars.
- We use the pluriharmonic embeddings of graphs into data space as the
ideal approximators [3.5] and developed optimization methods to
minimize the deviation of data approximants from the pluriharmonic
graphs.
- The idea of robust growth makes the whole approach more efficient. For
the organization of robust grows, we use truncated energy functionals.
In the splitting algorithms of optimization they also produce systems of
linear equations, and make the construction of the approximators much
more stable in presence of noise and outliers.
Most of the ideas are implemented in user-friendly software and can be
applied to many real-life problems.
For the development of applied systems we combine our original technology
with more classical approaches like decision trees, advanced kNN method
and Bayesian networks. For example, for the canine lymphoma diagnosis we
have tested more than 2,000,000 versions of combinations of known and our
novel data mining approaches, and the best solutions have been implemented
in JAVA (web-accessible) software. It is shown that for the differential
diagnosis of clinically vulnerable patients, the sensitivity (proportion
of correct prediction of positive results) of the system is 83.5%, and
specificity (proportion of correct prediction of negative results) is 77%.
For caninelymphoma screening purposes, the best data mining solution we
found has sensitivity 81.4% and specificity >99%.
On base of case-study, which has been done, the best solution for each
problem has been selected. The results obtained from case-study are
extremely favourable compared to many current human cancer screening tests
that rely upon single biomarkers. These include the current CA-125 screen
for human ovarian cancer (sensitivity approximately 50% and specificity
98% [3.1]) and the male PSA test (sensitivity approximately 65% and
specificity 35% [3.2]).
References to the research
Publications
(1) A.N. Gorban, A. Zinovyev, Principal manifolds and graphs in practice:
from molecular biology to dynamical systems, International Journal of
Neural Systems 20 (3) (2010), 219-232.
(2) A.N. Gorban, A. Y. Zinovyev, Principal Graphs and Manifolds, Chapter
2 in: Handbook of Research on Machine Learning Applications and
Trends: Algorithms, Methods, and Techniques, Emilio Soria Olivas et
al. (eds), IGI Global, Hershey, PA, USA, 2009, pp. 28-59.
(3) A.N. Gorban, N.R. Sumner, and A.Y. Zinovyev, Topological grammars for
data approximation, Applied Mathematics Letters, 20 (4) (2007),
382-386.
(4) A. Zinovyev, E. Mirkes, Data complexity measured by principal graphs,
Computers & Mathematics with Applications, Volume 65, Number
10, 1471-1482.
(5) A.N. Gorban, B. Kegl, D. Wunsch, A. Zinovyev (Eds.), Principal
Manifolds for Data Visualisation and Dimension Reduction, Lecture
Notes in Computational Science and Engineering, Vol. 58, Springer, Berlin
— Heidelberg — New York, 2008. (ISBN 978-3-540-73749-0).
(6) A. Gorban, A. Zinovyev, Elastic Principal Graphs and Manifolds and
their Practical Applications, Computing 75 (2005), 359-379.
(7) Furness
PN, Levesley
J, Luo
Z, Taub
N, Kazi
JI, Bates
WD, Nicholson
ML., A neural network approach to the biopsy diagnosis of early
acute renal transplant rejection, Histopathology, Volume 35
(1999), 461-467.
Grant
Data Mining for Lymphoma Differential Diagnosis, A University of
Leicester Innovation Partnership with [text removed for publication],
2012. European Regional Development Fund.
Details of the impact
Joint work with Institute Curie (Paris, France) started in 2004. This is
one of the top European cancer research and treatment centres. Together
with the Bioinformatics Unite of Institute Curie, we have developed a
software library which implements most of our methods. This software is
now open for non-commercial use worldwide [5.2]. Institute Curie
uses this software in various projects for visualization and analysis of
microarrays for various types of cancer, for visualization of clinical and
biochemical data [5.2].
Publication [5.3] demonstrates knowledge transfer impact as the
IC-MSQUARE conference is dedicated to application of mathematics in other
science and technology, and the author list of the paper has two member of
the University (Gorban and Mirkes) and three colleagues from [text removed
for publication] (Alexandris, Slater and Tuli).
Use in Humans
Many institutions and clinics in various countries have reported
successful use of these methods and software for clinical purposes [5.2]:
- The Ukrainian Medical Almanac [5.6] reported two new
applications: (i) Prediction of treatment result of long bones fracture
for diabetes patients, (ii) Pain management and quantitative estimation
of pain.
- Dr. Arndt Benecke (joint affiliation at Institut de Génétique et de
Biologie Moléculaire et Cellulaire, CNRS/INSERM/ULP, Collège de France
and Institut des Hautes Etudes Scientifique, France) used the method of
elastic maps for analysis of microarray data in cancer. This experience
was reflected in the subsequent publication [5.8].
Use in Animals
The treatment of dogs is a vast and recession-resistant business: there
are 80 million dogs in the United States alone, and even in recession most
people keep spending on their pets. Research into the treatment of cancer
in dogs also has relevance to the treatment of cancer in humans,
particularly because it relates to spontaneous cancer which occurs in a
domestic environment. "Lymphoma is one of the most common canine cancers,
representing 5% of all malignancies. It has an annual incidence on 25
cases per 100,00 dogs" [5.7].
[text removed for publication] has developed a lymphoma blood test, [text
removed for publication], [5.4, 5.5] which gives vets an
easier, less stressful, cheaper and quicker way of testing for lymphoma.
This means that dogs are more likely to be tested for lymphoma when any
suspicious symptoms show, and that results of the tests are available
quickly — generally the same day. If lymphoma is caught early on it can be
treated quickly. While researchers do not talk of a "cure" for lymphoma,
early treatment can produce a healthier dog for longer, adding 12 months
to two years to a dog's average 12-year lifespan.
The blood test was developed from serum samples collected from several
veterinary practices. The samples were fractionated and analysed by mass
spectrometry. Two protein peaks, with the highest diagnostic power, were
selected and further identified as acute phase proteins, C-Reactive
Protein and Haptoglobin. Data mining methods were then applied to the
collected data for the development of our online computer-assisted
veterinary diagnostic tool.
After testing of more than 2,000,000 versions of the combinations of the
known and original data mining approaches, the best solutions were found.
It is tested on the clinical data of several veterinary clinics worldwide.
The generated software is a tool for diagnostic, monitoring and screening.
Initially, the diagnosis of lymphoma was formulated as a classification
problem and then later refined as a lymphoma risk estimation. Three
classical methods, decision trees, advanced kNN and probability density
evaluation, were used in combinations with original approaches for
classification and risk estimation and several pre-processing approaches
were implemented to create the diagnostic system.
For the differential diagnosis the best solution gave a sensitivity and
specificity of 83.5% and 77%, respectively (using three input features,
CRP, Haptoglobin and standard clinical symptom). For the screening task,
the decision tree method provided the best result, with sensitivity and
specificity of 81.4% and >99%, respectively (using the same input
features). Furthermore, the development and application of new techniques
for the generation of risk maps allowed the visualisation of risk maps in
a more user-friendly manner.
This is a potentially useful tool for explanatory data analysis and
testing other theoretical input features in the final diagnosis. The risk
maps provide early diagnosis of lymphoma return several weeks before the
clinical symptoms are developed. In this monitoring lymphoma return the
risk maps perform significantly better than most of the veterinary
practitioners. The generated lymphoma software (JAVA) has the potential of
being web-accessible.
In a letter to the Vice-Chancellor of the University of Leicester from
[text removed for publication] reports "The new treatment monitoring test
has the potential to add a further [text removed for publication] to our
projected turnover. It has also bought forward the collaboration with the
largest veterinary corporation in the UK who were specifically interested
in the treatment monitoring application of our test. They are now planning
to launch the new test developed with University of Leicester which will
have an immediate impact on both our short and long term revenues" [5.1].
In short — this system is significantly changing veterinary practice in
the UK.
Sources to corroborate the impact
- Factual statement by [text removed for publication]
- Factual statement from Director of U900 Institut Curie and references
to the clinical projects.
- E. M. Mirkes, I. Alexandrakis, K. Slater, R. Tuli, A. N. Gorban,
Computational Diagnosis of Canine Lymphoma, Presented at the conference
IC-MSQUARE 2013, Prague September 2013 (Short version is published in
the Book of Abstracts IC-MSQUARE 2013), Accepted for publication in
IC-MSQUARE 2013 Proceedings (IOP Conference series), extended version is
invited to the Special Issue of Physics in Medicine and Biology.
Preprint version is published in arXiv: arXiv:1305.4942 [q-bio.QM]
- Canine lymphoma blood tests — results explained, [text removed for
publication], internal publication.
- Guidance notes for [text removed for publication], the canine lymphoma
blood test system.
- Ivchenko V.K., Galchenko V.Ya., Ivchenko A.V.: Part I: Prediction of
treatment result of long bones fracture for diabetes patients by means
of intellectual and statistical data analysis. Part I. Visual data
mining for multidimensional data, Ukrainian Medical Almanac , 2013, Vol.
16, Iss. 2 (Supplement), pp. 4-7; Part II. Production of prognostic
classification rules, Ukrainian Medical Almanac , 2013, Vol. 16, Iss. 2
(Supplement), pp. 8-11; Part III. Analysis of efficiency of produced
prognostic classification rules, Ukrainian Medical Almanac , 2013, Vol.
16, Iss. 2 (Supplement), pp. 12-15.
- [text removed for publication]
- Bécavin C, Benecke A. New dimensionality reduction methods for the
representation of high dimensional 'omics' data. Expert Rev Mol
Diagn. 11(1) (2011), 27-34.