PolySNAP Computer Software for enhanced processing and classifying of crystallographic and spectroscopic data
Submitting Institutions
University of Strathclyde,
University of GlasgowUnit of Assessment
ChemistrySummary Impact Type
TechnologicalResearch Subject Area(s)
Mathematical Sciences: Statistics
Chemical Sciences: Macromolecular and Materials Chemistry, Physical Chemistry (incl. Structural)
Summary of the impact
PolySNAP is an extensive commercial computer program developed at
WestCHEM to process and
classify large volumes of crystallographic and spectroscopic data. It is a
market-leading product
sold and supported by Bruker Corporation (a manufacturer of scientific
instruments for molecular
and materials research selling products world-wide) and is used in
laboratories throughout the
world supporting business in the pharmaceutical, materials, mining,
geology, and polymer science
sectors. The PolySNAP software was and continues to be sold in combination
with all Bruker x-ray
powder diffractometers.
Underpinning research
Context
The research undertaken at WestCHEM addresses the problem of analysing
large quantities data
from x-ray powder diffraction and one-dimensional spectroscopic data, in
particular, from Raman
and IR spectroscopy [1,5]. In commercial research settings, it is now
routine to collect thousands of
x-ray powder-diffraction patterns in a day using crystallisation robots
and a fast powder
diffractometer. Many pharmaceutical and materials laboratories carry out
such work routinely, one
example being in the pharmaceutical industry where companies need to find
all the possible
polymorphs of a drug candidate both to protect their intellectual property
and to ensure that more
efficacious forms are not excluded. Using crystallisation robots,
thousands of potential polymorphs
are crystallised in small quantities and subjected to x-ray powder
diffraction and, if possible, to
Raman spectroscopy.
Key Research Outcomes
The motivation for the research carried out at WestCHEM was to explore
ways to classify the data
automatically and to provide associated data visualisation tools. The
PolySNAP software classifies
the patterns automatically using cluster analysis and multivariate
statistics showing groups of
patterns that are similar and thus belong to the same polymorph as well as
highlighting new
patterns that do not match the library of known forms [2-4]. From its
first release, PolySNAP's
approach has been revolutionary in that it uses every measured data point,
whereas previously
software used only the heights and positions of the top 5-10 peaks [6].
This makes it far more
robust in its statistics especially with regard to poor quality data,
which are common in high-
throughput studies.
The WestCHEM research has led to a new integrated approach to full
powder-diffraction pattern
analysis. It incorporated wavelet-based data pre-processing, parametric
and non-parametric
statistical tests for full-pattern matching, and singular-value
decomposition to extract quantitative
phase information from mixtures. Every measured data point is used in both
qualitative and
quantitative analyses. The success of this new integrated approach was
demonstrated through
examples using several test data sets [6].
Key researchers
The initial research was carried out at WestCHEM under a grant from the
Ford Motor Company
(USA and Germany) starting in 2000. Major developments were commissioned
by Pfizer in Kent
and involved Professor Chris Gilmore (Prof at WestCHEM 1973-2010) along
with Dr Gordon Barr
(PDRA at WestCHEM 2002-10), and Dr Wei Dong (PDRA at WestCHEM 2003-07).
Grants and
other income continued from various sources until 2010 including an EPSRC
Adventurous
Research in Chemistry grant (2006).
References to the research
References 2, 3 and 5 best illustrate the quality of the research:
[1] `High-throughput powder diffraction. I. A new approach to qualitative
and quantitative powder
diffraction pattern analysis using full pattern profiles', C.J. Gilmore,
G. Barr, and J. Paisley, J.
Appl. Cryst. (2004), 37, 231-242. (doi: 10.1107/S002188980400038X)
[2] `High Throughput Powder Diffraction: II Applications of Clustering
Methods and Multivariate
Data Analysis', G. Barr, W. Dong and C.J. Gilmore, J. Appl. Cryst.
(2004), 37, 243-252. (doi:
10.1107/S0021889804000391)
[3] `High-throughput powder diffraction. III. The application of
full-profile pattern matching and
multivariate statistical analysis to round-robin-type data sets', Gordon
Barr, Wei Dong,
Christopher Gilmore and John Faber J. Appl. Cryst. (2004), 37,
243-252. (doi:
10.1107/S0021889804013743)
[4] `High Throughput Powder Diffraction: IV Cluster Validation using
Silhouettes and Fuzzy
Clustering', G. Barr, W. Dong, and C.J. Gilmore, J. Appl. Cryst.
(2004), 37, 874-882.
(doi:10.1107/S0021889804020990)
[5] `High Throughput Powder Diffraction V: The Use of Raman Spectroscopy
with and without X-ray
Powder Diffraction data', G. Barr, G. Cunningham, W. Dong, C.J. Gilmore
and T. Kojima, J.
Appl. Cryst. (2009), 42, 706-714. (doi: 10.1107/S0021889809022924)
[6] `PolySNAP3: a computer program for analysing and visualizing
high-throughput data from
diffraction and spectroscopic sources', G. Barr, W. Dong and C.J. Gilmore,
J. Appl. Cryst.
(2009), 42, 965-974. (doi: 10.1107/S0021889809025746)
PolySNAP was developed using grants to Prof Gilmore from Ford Motor
Company (2000), Pfizer
(2003, 2004, 2005), Bruker, the Cambridge CDC (2006), and EPSRC
(`Adventurous Research in
Chemistry', 2006) for a total value of £712k.
Details of the impact
WestCHEM develops software to support high-throughput systems
At the same time that the precursor to PolySNAP, the SNAP-1D software
package, was being
developed at WestCHEM to facilitate an integrated approach to full x-ray
powder-diffraction pattern
analysis, the pharmaceutical industry was developing automated
crystallisation systems using
robotics, and was beginning to generate large volumes of one-dimensional
diffraction patterns.
However, there was no software available to process, classify, and
visualise such data. PolySNAP
was written under contract from Pfizer specifically to address this
problem. It was the first software
to do this and used novel statistics and visualisation methods. Data
visualisation is a vital
component of handling large amounts of data. With PolySNAP, time
expenditure was reduced from
days to seconds for >100 data sets. It is an almost impossible task to
classify large data sets
without these techniques and so any investment in the new robotic hardware
would have been
pointless. The software was described by one Pfizer employee as "like
having a new spectroscopic
method made available". A confidential list of PolySNAP users up to 2010
is available, which
contains 72 users with commercial licences (Source 2). The software was
the property of Glasgow
University until 2010 when Professor Gilmore retired, at which time it was
sold to Bruker.
PolySNAP supplied worldwide with Bruker diffractometers
Bruker purchased the rights to the program from Pfizer in 2002-2003 for ca.
£125k (the same as
the original cost to Pfizer) and started to market PolySNAP commercially
under licence from
Glasgow University (Source 1). Until 2010, license fees to the University
totalled £84,164 (from
2008 £37,894) (Source 3). Bruker continued to develop this software and
Gilmore set up a limited
company partly for this purpose (Allander Science Ltd.) and continues to
act as a consultant to
Bruker. Dr Dong was employed full-time by Bruker during the assessment
period. The
development of PolySNAP also led to a free computer program called dSNAP
that uses the same
ideas to classify and visualise the results of searches on the Cambridge
Structural Database (see
below).
New versions of the software were released: PolySNAP-2, PolySNAP-M, and
the latest PolySNAP-3.
PolySNAP-3 is unique in that it allows data from multiple sources,
techniques or data collection
strategies to be incorporated into the analysis. For example, data from
powder diffraction can be
combined with differential scanning calorimetry and Raman data to give a
combined analysis and
more accurate classifications of samples [5, 6]. This method is proving so
important that Bruker are
currently extending it to include x-ray fluorescence data in the new
releases of PolySNAP. It saves
money because it releases the time of highly paid scientific staff and it
is also objective, avoiding
the risk that exists with this sort of data that interpretation can become
subjective.
The relationship between Bruker and the University continued until 2010
when Prof Gilmore
retired. At that point, Bruker paid the equivalent of two years' license
fees (€30k) to purchase the
software outright. PolySNAP continues to be a part of the company's
development plan (Source 4).
Current list price when PolySNAP is purchased separately is €2,700, and
approximately 120 have
been sold since 2004, while Bruker also offers another version, m-SNAP, at
a list price of €5,000.
PolySNAP continues to be sold in combination with all Bruker x-ray powder
diffractometers.
PolySNAP utilised across industry and education
Although the impact beyond the sale of the software is very difficult to
quantify, the beneficiaries of
this software are many:
- The pharmaceutical industry: Most large pharmaceutical companies use
PolySNAP: Novartis,
AstraZeneca, Pfizer, Roche, Syngenta, Sanofi Aventis, and many smaller
companies. It
dramatically reduces both the timescale and the reliability of sample
screening allowing even
very large datasets (>100,000 samples) to be pre-screened in minutes
to find the subset of
patterns most relevant to a given target. Prior to PolySNAP, such
searches would have used
reduced patterns in which the powder data are presented as a small set
of intensities and
positions for the top 5-10 peaks. PolySNAP was the first program to use
the full measured data
in all its analysis steps. High-throughput data are of poor quality and
using traditional reduced
patterns is ineffective at best and often worthless.
- Mining industry: e.g., companies such as Anglo Platinum and Mintek in
southern Africa, where
PolySNAP software is used to assess the quality of ore samples prior to
processing into pure
metals, and to group huge quantities of materials of different origin.
The software allows the
companies to carry out rapid comprehensive assessments of ore
variability in terms of bulk
mineralogical composition and facilitating decision-making.
- Materials Science: identifying new materials, studying phase changes
with temperature and
pressure, simple quantitative analysis of mixtures.
- Forensic science: e.g. South African Police Force have used PolySNAP
to match samples from
crime scenes with a database of known samples. It is especially valuable
when used with
mineral theft.
A Principal Scientist, Materials Science, at AstraZeneca explains the
value of PolySNAP (Source
5):
"Identification of a suitable solid form for development of a
crystalline material still remains a
great challenge for the pharmaceutical industry. Due to the different
physical properties (e.g.,
dissolution, stability) of different polymorphs, extensive screening is
performed during the
early stages of development to ensure the solid form architecture of the
proposed molecule
is fully understood and there have been a number of costly issues where
compounds have
had to be reformulated due to changes of form after launch.
In screening for different solid forms, both manual and automated
technologies have been
utilised and a number of high throughput platforms have been developed
that utilise common
analytical techniques, e.g., PXRD for analysis of solid form. These
techniques yield large
quantities of analytical data which is quite difficult to comprehend and
can rapidly overwhelm
operators with excessive data. [...] PolySNAP utilises a wide range of
mathematical
techniques to help manage large data sets and successfully guides the
scientists into not
only the presence of the different forms but, more importantly, how
these forms have been
manufactured and hence how they can be controlled. It provides a range
of pattern-matching
methods and allows the user to have full control over understanding the
date avoiding `black
box' type results leaving the operator in control and reducing large
datasets to a much
smaller number of samples which require interpretation by scientists.
The results from
PolySNAP allow rapid identification of `safe' areas of working (e.g.,
solvents, temperature,
concentrations, etc.) and where further understanding is required. Most
recently, the extra
dimension of adding the capability of other analytical data (e.g.,
spectroscopic) has provided
further confidence and opportunities in the pattern matching."
A one-month trial version of the PolySNAP software is available from
WestCHEM, accompanied by
a tutorial and trial data (Source 6).
Visualising the Cambridge Structural Database with dSNAP
The WestCHEM research that led to PolySNAP also underpinned a free
computer program called
dSNAP (Source 7) that classifies and visualises the results of searches on
the Cambridge
Structural Database (CSD), a repository of over 500,000 crystal structures
derived from x-ray and
neutron diffraction. The CSD is by far the most important source of
structural data for small
molecules, but while it is easy to search this database, it can be a
problem to interpret the results.
dSNAP is an important tool for this. dSNAP simplifies the problem using
the same methodology as
PolySNAP: it clusters the search results into groups that are very similar
and thus allows the user
to treat the group as a single entity so reducing the number of structures
that need to examined.
Additional tools are provided to assist in this. A confidential list of
166 licensed dSNAP users up to
2010 is available (Source 2).
Sources to corroborate the impact
[1] Evidence of use of software by Bruker: Bruker website http://www.bruker.com/en/products/x-ray-diffraction-and-elemental-analysis/x-ray-diffraction/xrd-software/applications/xrd-software-applications/polysnap.html
[2] A confidential list of PolySNAP users up to 2010 is
available. [To be treated as commercially
sensitive] The list also contains 166 licenced dSNAP users up to 2010.
This software is free,
and is now downloadable without a licence, so users after 2010 are not
registered on the
spreadsheet.
[3] There is a confidential spreadsheet of royalty income from
Bruker arising from sales of the
PolySNAP software up to 2010. [To be treated as commercially sensitive]
[4] Product Manager XRD at Bruker AXS GmbH in Karlsruhe Germany can be
contacted to
confirm the value of PolySNAP to Bruker products.
[5] Statement from Principal Scientist, Materials Sciences, AstraZeneca
R&D provides evidence of
the value of PolySNAP in materials analysis
[6] A one-month trial version of the PolySNAP software can be downloaded
by sending an e-mail
to snap@chem.gla.ac.uk. The
program comes with a tutorial and trial data.
[7] The dSNAP software can be downloaded from:
http://www.chem.gla.ac.uk/snap/PolySNAP_index.html