Using genomics to shape high performance computing
Submitting Institution
University of South WalesUnit of Assessment
Mathematical SciencesSummary Impact Type
TechnologicalResearch Subject Area(s)
Mathematical Sciences: Statistics
Biological Sciences: Genetics
Information and Computing Sciences: Computation Theory and Mathematics
Summary of the impact
Prof. Ron Wiltshire and Dr. Tatiana Tatarinova helped to develop
High Performance Computing Wales (HPC Wales) by providing "test" problems
from the life sciences (bioinformatics), defining key software packages,
and brought on-board collaborative partners to instigate project areas.
Established in 2010 with £44 million funding, HPC Wales is Europe's first
nation-wide "hub-and-spoke" super-computing facility. In the three years
following its conception, HPC Wales has approximately 50 collaborative
organisations supporting over 60 major projects; 40% in life sciences and
5% in bioinformatics through Wiltshire's and Tatarinova's
involvement. Further impact arose through the redesign of software and
up-skilling of HPC Wales staff.
Underpinning research
The problem
The recent construction of numerous genomic databases has resulted in
lengthy lists of millions or billions of chemical units (proteins) that
comprise a single organism's genome. Buried in these lists are the
organism's genes (e.g. it is estimated humans have 20 000-25 000 genes).
Searching for known genes within such lists and relating them to their
function allows researchers to predict how a particular organism will fare
under given conditions (e.g. how a certain person will respond to a
particular drug or how a given plant will behave when introduced to a
specific soil-type). Hence, targeted and individual- based treatments are
possible provided there is sufficient knowledge of the organism's genetic
components. However, searching for these genes within the genome is a
computationally intensive task and is the focus of bioinformatics.
The research
Locating and identifying known genes in an organism's genome requires a
range of parametric and non-parametric statistical approaches to compare
data sets, coupled with the development of novel algorithms involving the
integration of multiple data types and subsequent efficient processing of
those data-rich sets. To this end, significant progress has been made by
Dr. Tatiana Tatarinova at the University of South Wales, focussing on
partial sequencing of certain plant species with their relatively small
sequence length. Tatarinova has published in the region of 40 articles and
books and is co-author of over a dozen patents involving the modification
of genes in plant crops. Her contribution to the discipline was recognised
in the 2010 "International Review of Mathematics in the UK" and she has
been rated as one of Wales's 30 most promising Scientists by Welsh
Crucible.
Analysing full genome sequences is not feasible using desktop computers
due to the sheer scale of the data sets. Instead this requires access to
high-performance computing alongside the development of appropriate
software exploiting the highly-parallelised architecture of such
facilities. Thus in 2010 when HPC Wales was launched, Tatarinova's
research was a natural fit. Indeed, Tatarinova has a history of working
for and alongside many of the world's leading bio-agricultural companies,
for example Monsanto Co., and so has a successful track record in
academic and commercial collaborations.
The impact area
HPC Wales, Europe's first nationally-distributed supercomputing network,
was established in partnership with a number of Welsh universities, the
Welsh Government and Fujitsu in 2010. The projects it supports and its
long term direction was influenced by its early adopters. Tatarinova's
research in genomics required vast amounts of computational time and was a
natural fit with HPC Wales. Therefore Tatarinova, with support from Wiltshire,
contributed to the development of HPC Wales by providing important
research problems that were used to test and refine the system's hardware
and software. Furthermore, Tatarinova and Wiltshire brought on
board a number of collaborative organisations through the promotion of
their research specialities. To promote projects in this area, and as
evidence of their support for this work, Fujitsu funded a research
studentship while HPC Wales have since funded two further studentships on
related projects supervised by Tatarinova.
References to the research
Research Publications (these publications defined projects and
software used in HPC Wales; the de Vere et al. paper arose from
collaborative work that included the utilization of HPC Wales):
• Tatarinova, T.V.; Alexandrov, N.; Bouck, J.; Feldmann, K. (2010). GC3
biology in corn, rice, sorghum and other grasses. BMC Genomics, 11:
308. doi: 10.1186/1471-2164-11-308
• Sablok, G.; Chandra Nayak, K.; Vazquez, F.; Tararinova, T.V. (2011).
Synonymous codon usage, GC3, and evolutionary patterns across
plastomes of three pooid model species: emerging grass genome models for
monocots. Molecular biotechnology, 49(2): 116-128. doi:
10.1007/s12033-011-9383-9.
• de Vere, N.; Rich, T.C.G.; Ford, C.R.; Trinder, S.A.; Long, C.; Moore,
C.W.; Satterthwaite, D.; Davies, H.; Allainguillaume, J.; Ronca, S.;
Tatarinova, T.; Garbett, H.; Walker, K.; Wilkinson, M.J. (2012). DNA
Barcoding the Native Flowering Plants and Conifers of Wales, PlosOne,
7(6): e37945. doi:10.1371/journal.pone.0037945
Research Grants/Funding (selection of research studentships
awarded to Tatarinova that are related to impact on HPC Wales):
• £56K from Fujitsu & HPC Wales for 1 research studentship (No grant
number, can provide collaborative agreement on request, but see http://www.hpcwales.co.uk/farzana-studentship)
• 2 x KESS Research Students Studentships with Morvus Technology Ltd and
National Botanical Gardens Wales, both in collaboration with HPC Wales
(Projects "Development of theoretical cell models for the action of novel
antiapoptotic proteins" and "Development of bioinformatics tools and
procedures for computational support", respectively)
• £2.5k from Welsh Crucible for "My GENE Code PEiMB: Public Engagement in
Molecular Biology". This Award required selection by the University and
attendance at numerous workshops throughout Wales engaging with
multi-disciplinary researchers.
Details of the impact
HPC Wales Background
High Performance Computing Wales (HPC Wales), launched in July 2010, is a
£44 million five-year project providing a super-computing facility at a
scale unprecedented in Europe due to its unique "hub- and-spoke" design.
Briefly, a number of "hubs" are distributed throughout the nation and are
joined by high-speed connections. Financial support was provided by ERDF
and ESF European funds, the Department for Business, Innovation and
Skills, collaborating academic institutions (including USW), the Welsh
Assembly Government and the private sector.
In early 2011, Fujitsu won the procurement bid to provide infrastructure,
support and services. The nature of the projects, the application areas
and the necessary equipment was to be shaped by its early adopters since
they would be providing real-world problems that would test and refine the
system. Associated with the procurement process, Fujitsu re-invested
approximately £1.5 million to support projects led by early adopters.
Wiltshire identified a number of early adopters, including
Tatarinova, who provided sample problems that shaped current application
areas, trained HPC Wales staff and introduced further collaborators from
commercial organisations, thus helping to expand employment and develop
technical skills. Indeed, HPC Wales employs approximately 30 managerial,
technical and support staff and it is planned to generate a further 400
jobs requiring high level skills and technical training.
Impact on HPC Wales
Tatarinova's "test" project, started in 2011, involved the analysis of
large genome sequences using a combination of standard software coupled
with the implementation of novel algorithms. Funded by a £56K award from
Fujitsu, HPC Wales were able to understand the type of problems they would
be supporting and how to appropriately manage their hardware and software.
This work was so influential that in September 2012 HPC Wales rated it as
one of their top 5 projects.
The success of this project resulted in two further Tatarinova-led
projects supported by HPC Wales, each worth £15k, and running in
collaboration with non-academic partners including the National Botanical
Gardens of Wales (the Barcode Wales project) and Morvus Technology
Ltd. investigating a range of problems including analysing strains of Escherichia
coli.
Tatarinova and Wiltshire actively promoted HPC Wales by
presenting at numerous promotional events and gave media interviews.
Further projects and collaborators, both within and outside of academia,
arose as a result of these promotional activities (e.g. OSTC Ltd. have two
projects utilising HPC Wales working with Roach), thus helping to
secure the future of HPC Wales and its associated employment.
As of October 2013, and in only three years from its inception, in excess
of 50 commercial businesses and industrial partners are engaged in over 60
collaborative projects using HPC Wales; 40% of these involving the life
sciences and approximately 5% on problems in genomics and bioinformatics.
A key feature of HPC Wales is its accessibility to businesses, such as
those brought on board by Wiltshire and Tatarinova, and HPC Wales'
success in this area was acknowledged through being a runner up in Open
Data category of the national Next Generation Digital Challenge Awards in
October 2013.
Impact on supercomputing and bioinformatics
As part of Tatarinova's "test" project, certain bioinformatics software (cisexpress)
required redesigning to work on the highly-parallelised architecture of
HPC Wales. Fujitsu provided the equivalent of approximately £90k in
consultancy fees to support the redesign of key algorithms that not only
improved essential software for the analysis of large genome sequences,
but also resulted in training HPC Wales staff.
Since early 2013, this revised software, running on HPC Wales, has been
made available to the world-wide scientific community (http://glacombio01.comp.glam.ac.uk/cisExpress/new/home.php)
and as of October 2013, approximately 50% of the total simulations
performed (numbering several hundred) have been conducted by scientists
external to USW. This revised software has therefore laid foundations for
further bioinformatics projects around the world with HPC Wales and USW
researchers at their core.
Creating impact legacy
Tatarinova's work with HPC Wales has been designed to create a lasting
impact. Barcode Wales has constructed a DNA database for the
native flowering plants and conifers for Wales and represents the most
comprehensive sampling of any national flora to date. Thus, future
environmental surveys need not rely on a taxonomic expert to determine
flora and can instead use sampled DNA and associated software developed
from this project. In January 2013, Tatarinova, alongside Prof. Denis
Murphy (USW), created a spin-out company called myregulome.com ltd
which is currently marketing software for genomic analysis developed in
collaboration with Fujitsu and HPC Wales.
Sources to corroborate the impact
Shaping HPC Wales:
-
http://www.hpcwales.co.uk/sites/default/files/hpcwales/HPCWales_NewsByte0112.pdf
Details of Fujitsu providing facilities and support for HPC Wales;
Tatarinova defining HPC Wales software.
-
http://www.hpcwales.co.uk/who-we-are
Current list of staff employed directly by HPC Wales in management,
support and technical roles.
-
http://www.bbc.co.uk/news/10587005
Job creation estimates and funding sources for HPC Wales.
-
http://www.fujitsu.com/downloads/TC/sc12/booth/hpc-wales-fujitsubp-sc12.pdf
Priority areas of HPC Wales, including a list of customers and customer
feedback, demonstrating Tatarinova's influence on shaping HPC Wales.
Statement that Tatarinova's "test" project is one of HPC Wales' top 5
projects.
-
http://www.welshcrucible.org.uk/2011-programme/participants/welsh-crucible-participant-
profiles/western-mail-profiles-2011-series/western-mail-profiles-dr-tatiana-tatarinova-university-
of-glamorgan/
How HPC Wales' projects have been shaped by Tatarinova.
-
http://www.hpcwales.co.uk/sites/default/files/hpcwales/20120905_software_list.pdf
Details of software installed on HPC Wales including significant
genomics-related programs.
Promoting HPC Wales and introducing business partners:
-
http://www.bbc.co.uk/programmes/b01pw57l
Radio interview on BBC Radio Wales on Science Cafe, broadcast 15th
January, 2013, with Tatarinova promoting HPC Wales and explaining how
she has helped define its application areas.
-
http://www.hpcwales.co.uk/fighting-cancer
Video link and case study — reference to algorithm being redesigned in
collaboration with Fujitsu, HPC Wales and other software developers.
Impact on bioinformatics:
- International Review of Mathematical Sciences in the United Kingdom http://www.epsrc.ac.uk/SiteCollectionDocuments/other/MathsIR2010EvidenceDocumentsParts1-
3.pdf (p.173)
"The analysis of properties of genes in various plant species by
Tatarinova (Glamorgan) and colleagues has provided major insight for
tissue specific and stress response in plants and resulted in patents in
areas of plant biotechnology"
Corroborating sources:
- HPC Wales
Can corroborate Wiltshire's and Tatarinova's involvement in
defining software and projects for HPC Wales.