More cost-effective drug discovery using virtual screening
Submitting Institution
University of SheffieldUnit of Assessment
Communication, Cultural and Media Studies, Library and Information Management Summary Impact Type
TechnologicalResearch Subject Area(s)
Mathematical Sciences: Statistics
Information and Computing Sciences: Computation Theory and Mathematics, Information Systems
Summary of the impact
The discovery of a new drug can take 10 years, cost in excess of
one-billion dollars and involve synthesising and testing thousands of
possible drug molecules. Virtual (i.e., computer-based) screening is used
in the early stages of drug discovery to focus attention on those
molecules in a chemical database that are most likely to exhibit the
required drug action and that are hence priority candidates for further,
more detailed study. Virtual screening thus increases the cost-
effectiveness of pharmaceutical research by bringing novel drugs to
patients more quickly.
Work in Sheffield since 1993 on virtual screening has resulted in three
computer programs that enable much more effective screening to take place
than was previously possible and that are now used throughout the
world-wide pharmaceutical industry: GALAHAD (Genetic Algorithm with Linear
Assignment for the Hypermolecular Alignment of Datasets), GASP (Genetic
Algorithm Superimposition Program) and GOLD (Genetic Optimization for
Ligand Docking).
Underpinning research
Background
Two of the most important virtual screening techniques are ligand
docking and pharmacophore mapping. These both focus on the
three-dimensional (3D) structures of molecules and are computationally
demanding, especially when molecules are conformationally flexible,
i.e., can adopt a range of 3D shapes when they interact with a biological
macromolecular target, typically a specific protein in the body.
Ligand docking (the focus of GOLD) involves identifying those molecules
in a database that fit a target, in much the same way as a key fits into a
lock, since those with a good fit may be biologically active.
Pharmacophore mapping (the focus of GASP and GALAHAD) involves identifying
the structural features common to molecules that have already been shown
to be biologically active.
Sheffield research
Docking and pharmacophore mapping with flexible molecules are examples of
combinatorial optimization problems for which efficient, conventional
algorithms are not available. However, they are very well suited to
genetic algorithm (GA) techniques, the use of which in chemoinformatics
was pioneered in Sheffield work in the early Nineties and which form the
basis of GOLD, GASP and GALAHAD.
The GA in the Sheffield GOLD program identifies the best possible fit of
a flexible molecule into a target protein. This was an entirely novel
approach when the prototype program was first described in 1995 [R1],
following a collaboration with Wellcome (now GlaxoSmithKline, or GSK) by
Willett (Professor of Information Science since 1991). Extensive
development and testing [R2] in an MRC/DTI LINK project at Sheffield
(£205K, 1995-97) resulted in a program that was rapidly adopted by
industry once it was distributed by the Cambridge Crystallographic Data
Centre (CCDC, the industrial partner in the LINK project).
The GA in the Sheffield GASP program is analogous to that in GOLD: the
latter tests the fit of a molecule to a protein while the former tests the
fit of a series of active molecules to each other [R3]. GASP was developed
during the GOLD project and was subsequently commercialized by a US
chemoinformatics company, Tripos Inc. (St Louis, MO) to complement an
existing product, which was based on Sheffield research in the late
Eighties and which was less well-suited than GASP to the handling of
flexible molecules. A subsequent funded collaboration between Willett and
Tripos (£121K, 2002-04) led to a novel method for aligning the 3D
structures of pairs of molecules [R4] that lies at the heart of a further
pharmacophore program, GALAHAD [R5], that Tripos now distributes. GALAHAD
also makes use of a multiple objective GA, an approach to optimization
that was first applied to chemoinformatics in a collaboration with GSK
(£129K, 1999- 2002) by Gillet (then Senior Lecturer and Professor of
Chemoinformatics since 2009) [R6]. This approach has since been
successfully demonstrated in a range of other chemoinformatics
applications.
The high quality of the research carried out by the Sheffield
chemoinformatics group (Gillet, Willett and Holliday (Senior Research
Manager since 1999)) is further evidenced by:
- Inspection of the core journal for chemoinformatics, the Journal
of Chemical Information Modeling, where many articles are
published by industrial, rather than academic, researchers and where the
Sheffield group has provided more contributions than any other
organization over the journal's 53-year lifetime.
- The Sheffield group being the recipients of 2012 Jason Farradane
Award of the UK Electronic Information Group "in recognition of
outstanding work in the information profession", the commendation
highlighting the "joint research projects with many of the world's
leading pharmaceutical, agrochemical and software companies".
- Willett being the only person to have been the recipient of all three
of the American Chemical Society's awards for contributions to
chemoinformatics and the pharmaceutical industry: the 1993 Herman
Skolnik Award, the 2005 Award for Computers in Chemical and
Pharmaceutical Research, and the 2010 Patterson-Crane Award.
The group's publications since 1993 have attracted over ten-thousand
citations in the Web of Science (WoS), with many of these
reflecting the impact of the work on industrial research. Multi- national
pharmaceutical companies that have cited Sheffield research at least 20
times include Abbott, AstraZeneca, Bristol-Myers Squibb, Eli Lilly, GSK,
Merck, Novartis, Pfizer and Roche. The work has also been extensively
cited by the world's major chemoinformatics software and database
companies, with such companies citing Sheffield work at least 5 times
including Accelrys, BCI, CCDC, CCG, CEREP, De Novo, Leadscope, Molecular
Networks, OpenEeye, Schroedinger and Tripos. As [S1] notes "The group
can boast a prodigious corpus of published research as well as a number
of prestigious alumni in industry and academia, with the result that
most serious researchers in the field have had some direct or indirect
contact with them....They also provide much-needed leadership to
chemoinformaticians - in the pharmaceutical industry as well as in the
academic community - by maintaining exceptional standards of objectivity
and critical examination of their own work that are typically weaknesses
in our field of research".
References to the research
R1. Jones, G., Willett, P. & Glen, R.C. "Molecular recognition of
receptor sites using a genetic algorithm with a description of
desolvation". Journal of Molecular Biology, 245, 1995,
43-53. (5-year journal impact factor IF =3.89, 691 citations in Web of
Science)
R2. Jones, G., Willett, P., Glen, R.C., Leach, A.R. & Taylor, R.
"Development and validation of a genetic algorithm for flexible docking."
Journal of Molecular Biology, 267, 1997, 727-748. (IF=3.89,
2350 citations)
R3. Jones, G., Willett, P. & Glen, R.C. "A genetic algorithm for
flexible molecular overlay and pharmacophore elucidation". Journal of
Computer-Aided Molecular Design, 9, 1995, 532- 549.
(IF=3.61, 246 citations)
R4. Richmond, N.J., Willett, P. & Clark, R.D. "Alignment of
three-dimensional molecules using an image recognition algorithm". Journal
of Molecular Graphics and Modelling, 23, 2004, 199- 209. (IF
2.22, 36 citations)
R5. Richmond, N.J., Abrams, C., Wolohan, P.R.N., Abrahamian, E., Willett,
P. & Clark, R.D. "GALAHAD. 1. Pharmacophore identification by
hypermolecular alignment of ligands in 3D." Journal of Computer-Aided
Molecular Design, 20, 2006, 567-587. (IF=3.61, 80 citations)
R6. Gillet, V.J., Khatib, W., Willett, P., Fleming, P.J. & Green,
D.V.S. "Combinatorial library design using a multiobjective genetic
algorithm." Journal of Chemical Information and Computer Sciences,
42, 2002, 375-385. (IF=4.07, 99 citations)
Details of the impact
The impact of the Sheffield chemoinformatics work in terms of increasing
economic prosperity for the pharmaceutical industry and increasing the
range of medicines available to patients has been two-fold: by means of
software developed in Sheffield and then distributed to industry; and by
means of extensive industrial adoption of algorithms and methods published
by the group in the open literature. The reach of the work is demonstrated
by its current use throughout the pharmaceutical industry and the
significance by its role in helping to discover new drugs that can benefit
patients throughout the world.
GOLD has been distributed by CCDC since 1998 and is currently used for
pharmaceutical research in over 50 different countries:
- "For many years, GOLD has been one of the top two protein-ligand
docking programs for industrial and academic research in drug
discovery, and is used by around 60 major pharmaceutical companies and
more than 600 Universities worldwide" [S2];
- "My group of 18 modellers and chemoinformaticians are actively
involved in the support of many drug discovery projects; our work has
led to the discovery of several novel chemical entities now in the
clinic or on the market. One of the tasks we perform is virtual
screening of large databases of chemical structure, and for that
purpose we regularly use GOLD as the tool of choice" [S3].
GASP has been distributed by Tripos since 1996, with an enhanced version
available since 2003, and GALAHAD has been distributed by them since 2005.
These three products have been very successful commercially, with total
sales to date of ca. £7 million and with royalties to the University of
Sheffield of ca. £900K (£82K in 2008-12). The software licensees include
many of the world's leading chemical companies, with, e.g., GASP and
GALAHAD users including Amgen, Bayer, Boehringer Ingelheim, Bristol-Myers
Squibb, DuPont, Eli Lilly, Genentech, GSK, Novo Nordisk, Sanofi Aventis
and Syngenta inter alia.
License holders such as these have made very extensive use of the
software, as demonstrated by published accounts in the scientific
literature of the use of GALAHAD, GASP and GOLD in companies' internal
drug-discovery projects. WoS searches in October 2013 for the six
references listed in Section 3 reveals over 250 citations to this research
by industrial companies in 2008-13. For example, articles in two leading
medicinal chemistry journals (Journal of Medicinal Chemistry
(IF=5.38) and Bioorganic and Medicinal Chemistry (IF=3.15))
describe the use of the Sheffield software by world-leading pharmaceutical
companies to support research in cancer (Genentech), heart disease (Pfizer
and Proctor & Gamble), cellular signalling (GSK), cognitive impairment
such as schizophrenia and Alzheimer's (Abbott), and obesity (Takeda) inter
alia.
The impact of the Sheffield work is much broader than just the three
specific commercial products, in that techniques described by the
Sheffield group in the open literature have been widely adopted, as
evidenced both by literature citation and by incorporation in software
that is used around the world on a daily basis. For example:
- "We also follow the work of the Sheffield group keenly and we have
adopted some of their ideas on data fusion techniques and
multi-objective optimization into our own code; we have done this more
with the research coming out of this department than we have with any
other academic cheminformatics group in Europe " [S3];
- "Sheffield research has been at the heart of all the commonly
used, indispensable systems for handling chemical reactions, and 2D,
3D and generic chemical structures in industry.....Very few
universities worldwide are capable of addressing the "virtual R&D"
problem; Sheffield is the leader" [S4].
Sources to corroborate the impact
S1. Email from Director of Life Sciences, Simulations Plus corroborates
impact of Sheffield chemoinformatics work on the pharmaceutical industry
in general.
S2. Email from Emeritus Research Fellow, Cambridge Crystallographic Data
Centre corroborates reach of Sheffield chemoinformatics work in the
pharmaceutical industry.
S3. Email from Global Head of Computer-Aided Drug Discovery, Novartis
corroborates use of Sheffield chemoinformatics work and corporate use of
GOLD at Novartis.
S4. Email from Associate Editor, Journal of Chemical Information and
Modeling corroborates impact of Sheffield chemoinformatics work on
the pharmaceutical industry in general, and has detailed knowledge of the
contribution to the Journal of Chemical Information and Modeling.
S5. Director of Computational Chemistry Europe, GlaxoSmithKline. Can
corroborate impact of Sheffield chemoinformatics work on the
pharmaceutical industry in general, and has detailed knowledge of the
development of GOLD.