### Data maps with applications to medical diagnostics and monitoring

**Submitting Institution**

University of Leicester**Unit of Assessment**

Mathematical Sciences**Summary Impact Type**

Technological**Research Subject Area(s)**

Mathematical Sciences: Pure Mathematics, Statistics

Information and Computing Sciences: Computation Theory and Mathematics

**Download original**

PDF**Summary of the impact**

Advanced technologies for data visualisation and data mining, developed in the Unit in collaboration with national and international teams, are widely applied for development of medical services. In particular, a system for canine lymphoma diagnosis and monitoring developed with [text removed for publication] has now been successfully tested using clinical data from several veterinary clinics. The risk maps produced by our technology provide early diagnosis of lymphoma several weeks before the clinical symptoms develop. [text removed for publication] has estimated the treatment test, named [text removed for publication], developed with the Unit to add [text removed for publication] to the value of their business. Institute Curie (Paris), applies this data mapping technique and the software that has been developed jointly with Leicester in clinical projects.

**Underpinning research**

The problems related to *large data set analysis and visualisation,
model reduction and the struggle with complexity of data sets* are
important for many areas of human activity. The identification of hidden
geometry and topology in noisy data sets is a challenging task. Many
branches of data analysis aim to solve such problems under some additional
assumptions that simplify the problem. However, the verification of these
assumptions may be more complicated than the solutions of the problems. A
universal technology for uncovering the hidden structure is very
desirable. An answer to this challenge cannot be simple because it must
potentially cover the majority of situations.

In the 1990s Levesley and Light produced theoretical results concerning
the approximation power of neural networks. This work led to Levesley's
involvement in a simple neural network model for the prediction of acute
rejection of kidney transplants together with pathologists from the
University of Leicester [**3**.7]. The Unit recognised the potential
for impact of research in this area, leading to the development of a team
under the leadership of Gorban with more specific expertise in the theory
and practical application of neural networks.

In summary, we have developed a universal technology for revealing and visualising the hidden structure in data. For this purpose, we have used ideas both old and new:

- The oldest of them is the idea of self-consistency introduced by H. Steinhaus in 1957 (k-means) and then recognized as a very general and productive idea that can be used for construction of many principal objects like principal manifolds and principal graphs (Husty at al, 1984). This idea is an intrinsic part of the self-organizing maps (SOM) and many data approximation approaches also.
- The application of quadratic elastic energy functionals is a basic
idea in spline approximation and is used by us for construction of
principal manifolds, in the elastic maps technology [
**3**.6]. - Gorban and Zinovyev (Curie) developed the topological grammars
approach for data analysis [
**3**.3] based on the idea of graph grammars. - We use the pluriharmonic embeddings of graphs into data space as the
ideal approximators [
**3**.5] and developed optimization methods to minimize the deviation of data approximants from the pluriharmonic graphs. - The idea of robust growth makes the whole approach more efficient. For the organization of robust grows, we use truncated energy functionals. In the splitting algorithms of optimization they also produce systems of linear equations, and make the construction of the approximators much more stable in presence of noise and outliers.

Most of the ideas are implemented in user-friendly software and can be applied to many real-life problems.

For the development of applied systems we combine our original technology with more classical approaches like decision trees, advanced kNN method and Bayesian networks. For example, for the canine lymphoma diagnosis we have tested more than 2,000,000 versions of combinations of known and our novel data mining approaches, and the best solutions have been implemented in JAVA (web-accessible) software. It is shown that for the differential diagnosis of clinically vulnerable patients, the sensitivity (proportion of correct prediction of positive results) of the system is 83.5%, and specificity (proportion of correct prediction of negative results) is 77%. For caninelymphoma screening purposes, the best data mining solution we found has sensitivity 81.4% and specificity >99%.

On base of case-study, which has been done, the best solution for each
problem has been selected. The results obtained from case-study are
extremely favourable compared to many current human cancer screening tests
that rely upon single biomarkers. These include the current CA-125 screen
for human ovarian cancer (sensitivity approximately 50% and specificity
98% [**3**.1]) and the male PSA test (sensitivity approximately 65% and
specificity 35% [**3**.2]).

**References to the research**

__Publications__

(1) A.N. Gorban, A. Zinovyev, Principal manifolds and graphs in practice:
from molecular biology to dynamical systems, *International Journal of
Neural Systems* 20 (3) (2010), 219-232.

(2) A.N. Gorban, A. Y. Zinovyev, Principal Graphs and Manifolds, Chapter
2 in: *Handbook of Research on Machine Learning Applications and
Trends: Algorithms, Methods, and Techniques,* Emilio Soria Olivas et
al. (eds), IGI Global, Hershey, PA, USA, 2009, pp. 28-59.

(3) A.N. Gorban, N.R. Sumner, and A.Y. Zinovyev, Topological grammars for
data approximation, *Applied Mathematics Letters,* 20 (4) (2007),
382-386.

(4) A. Zinovyev, E. Mirkes, Data complexity measured by principal graphs,
*Computers & Mathematics with Applications,* Volume 65, Number
10, 1471-1482.

(5) A.N. Gorban, B. Kegl, D. Wunsch, A. Zinovyev (Eds.), *Principal
Manifolds for Data Visualisation and Dimension Reduction*, Lecture
Notes in Computational Science and Engineering, Vol. 58, Springer, Berlin
— Heidelberg — New York, 2008. (ISBN 978-3-540-73749-0).

(6) A. Gorban, A. Zinovyev, Elastic Principal Graphs and Manifolds and
their Practical Applications, *Computing* 75 (2005), 359-379.

(7) Furness
PN, Levesley
J, Luo
Z, Taub
N, Kazi
JI, Bates
WD, Nicholson
ML., A neural network approach to the biopsy diagnosis of early
acute renal transplant rejection, *Histopathology*, Volume 35
(1999), 461-467.

__Grant__

Data Mining for Lymphoma Differential Diagnosis, A University of Leicester Innovation Partnership with [text removed for publication], 2012. European Regional Development Fund.

**Details of the impact**

Joint work with Institute Curie (Paris, France) started in 2004. This is
one of the top European cancer research and treatment centres. Together
with the Bioinformatics Unite of Institute Curie, we have developed a
software library which implements most of our methods. This software is
now open for non-commercial use worldwide [**5**.2]. Institute Curie
uses this software in various projects for visualization and analysis of
microarrays for various types of cancer, for visualization of clinical and
biochemical data [**5**.2].

Publication [**5**.3] demonstrates knowledge transfer impact as the
IC-MSQUARE conference is dedicated to application of mathematics in other
science and technology, and the author list of the paper has two member of
the University (Gorban and Mirkes) and three colleagues from [text removed
for publication] (Alexandris, Slater and Tuli).

*Use in Humans*

Many institutions and clinics in various countries have reported
successful use of these methods and software for clinical purposes [**5**.2]:

- The Ukrainian Medical Almanac [
**5**.6] reported two new applications: (i) Prediction of treatment result of long bones fracture for diabetes patients, (ii) Pain management and quantitative estimation of pain. - Dr. Arndt Benecke (joint affiliation at Institut de Génétique et de
Biologie Moléculaire et Cellulaire, CNRS/INSERM/ULP, Collège de France
and Institut des Hautes Etudes Scientifique, France) used the method of
elastic maps for analysis of microarray data in cancer. This experience
was reflected in the subsequent publication [
**5**.8].

*Use in Animals*

The treatment of dogs is a vast and recession-resistant business: there
are 80 million dogs in the United States alone, and even in recession most
people keep spending on their pets. Research into the treatment of cancer
in dogs also has relevance to the treatment of cancer in humans,
particularly because it relates to spontaneous cancer which occurs in a
domestic environment. "Lymphoma is one of the most common canine cancers,
representing 5% of all malignancies. It has an annual incidence on 25
cases per 100,00 dogs" [**5**.7].

[text removed for publication] has developed a lymphoma blood test, [text
removed for publication], **[5**.4, **5**.5] which gives vets an
easier, less stressful, cheaper and quicker way of testing for lymphoma.
This means that dogs are more likely to be tested for lymphoma when any
suspicious symptoms show, and that results of the tests are available
quickly — generally the same day. If lymphoma is caught early on it can be
treated quickly. While researchers do not talk of a "cure" for lymphoma,
early treatment can produce a healthier dog for longer, adding 12 months
to two years to a dog's average 12-year lifespan.

The blood test was developed from serum samples collected from several veterinary practices. The samples were fractionated and analysed by mass spectrometry. Two protein peaks, with the highest diagnostic power, were selected and further identified as acute phase proteins, C-Reactive Protein and Haptoglobin. Data mining methods were then applied to the collected data for the development of our online computer-assisted veterinary diagnostic tool.

After testing of more than 2,000,000 versions of the combinations of the known and original data mining approaches, the best solutions were found. It is tested on the clinical data of several veterinary clinics worldwide. The generated software is a tool for diagnostic, monitoring and screening. Initially, the diagnosis of lymphoma was formulated as a classification problem and then later refined as a lymphoma risk estimation. Three classical methods, decision trees, advanced kNN and probability density evaluation, were used in combinations with original approaches for classification and risk estimation and several pre-processing approaches were implemented to create the diagnostic system.

For the differential diagnosis the best solution gave a sensitivity and specificity of 83.5% and 77%, respectively (using three input features, CRP, Haptoglobin and standard clinical symptom). For the screening task, the decision tree method provided the best result, with sensitivity and specificity of 81.4% and >99%, respectively (using the same input features). Furthermore, the development and application of new techniques for the generation of risk maps allowed the visualisation of risk maps in a more user-friendly manner.

This is a potentially useful tool for explanatory data analysis and testing other theoretical input features in the final diagnosis. The risk maps provide early diagnosis of lymphoma return several weeks before the clinical symptoms are developed. In this monitoring lymphoma return the risk maps perform significantly better than most of the veterinary practitioners. The generated lymphoma software (JAVA) has the potential of being web-accessible.

In a letter to the Vice-Chancellor of the University of Leicester from
[text removed for publication] reports "The new treatment monitoring test
has the potential to add a further [text removed for publication] to our
projected turnover. It has also bought forward the collaboration with the
largest veterinary corporation in the UK who were specifically interested
in the treatment monitoring application of our test. They are now planning
to launch the new test developed with University of Leicester which will
have an immediate impact on both our short and long term revenues" [**5**.1].

In short — this system is significantly changing veterinary practice in the UK.

**Sources to corroborate the impact **

- Factual statement by [text removed for publication]
- Factual statement from Director of U900 Institut Curie and references to the clinical projects.
- E. M. Mirkes, I. Alexandrakis, K. Slater, R. Tuli, A. N. Gorban, Computational Diagnosis of Canine Lymphoma, Presented at the conference IC-MSQUARE 2013, Prague September 2013 (Short version is published in the Book of Abstracts IC-MSQUARE 2013), Accepted for publication in IC-MSQUARE 2013 Proceedings (IOP Conference series), extended version is invited to the Special Issue of Physics in Medicine and Biology. Preprint version is published in arXiv: arXiv:1305.4942 [q-bio.QM]
- Canine lymphoma blood tests — results explained, [text removed for publication], internal publication.
- Guidance notes for [text removed for publication], the canine lymphoma blood test system.
- Ivchenko V.K., Galchenko V.Ya., Ivchenko A.V.: Part I: Prediction of treatment result of long bones fracture for diabetes patients by means of intellectual and statistical data analysis. Part I. Visual data mining for multidimensional data, Ukrainian Medical Almanac , 2013, Vol. 16, Iss. 2 (Supplement), pp. 4-7; Part II. Production of prognostic classification rules, Ukrainian Medical Almanac , 2013, Vol. 16, Iss. 2 (Supplement), pp. 8-11; Part III. Analysis of efficiency of produced prognostic classification rules, Ukrainian Medical Almanac , 2013, Vol. 16, Iss. 2 (Supplement), pp. 12-15.
- [text removed for publication]
- Bécavin C, Benecke A. New dimensionality reduction methods for the
representation of high dimensional 'omics' data.
*Expert Rev Mol Diagn*. 11(1) (2011), 27-34.