Nearly every large-vocabulary speech recognition system in current use
employs outputs from fundamental research carried out in the University of
Cambridge Department of Engineering (DoEng) on adaptation of Hidden Markov
Models (HMMs). One example of the commercial application of these outputs
is their use on the Microsoft Windows desktop for both the command and
control functions and the dictation functions. Approximately one billion
copies of Windows have been shipped since 2008. Other examples show the
outputs used in the automatic transcription of a wide range of types of
data. [text removed for publication]
Research in robust speech enhancement and audio-visual processing has led
to impact on a range of different fronts:
(i) Collaboration with CSR, a leading $1 billion consumer
electronics company, has shaped its R&D research agenda in speech
enhancement, has inspired ideas for new product improvements, and has
helped establish Belfast as an audio research centre of excellence within
(ii) Our technology has changed the strategic R&D direction of a
company delivering healthcare monitoring systems, with potential
for multi-million pound savings in NHS budgets.
(iii) Audio-visual speech processing research has led to a
proof-of-concept biometric system, Liopa: a novel,
robust and convenient person authentication and verification technology
exploiting lip and facial movements (www.liopa.co.uk). A start-up company
is in an advanced stage of being established to commercialise this
product. The product and commercialisation strategy was awarded First
Prize in the 2013 NISP Connect £25K entrepreneurship competition in the
Digital Media and Software category. The first commercial partner for
Liopa has been engaged.
(iv) A system-on-chip implementation of a version of our speech
recognition engine, which was developed through an EPSRC project, was
awarded first prize in the High Technology Award in the 2010 NISP £25K
Awards competition, and contributed to the founding of a spin-out company,
Analytics Engines (www.analyticsengines.com).
Speech Graphics Ltd is a spinout company from the University of
Edinburgh, building on research into the animation of talking heads during
2006-2011. Speech Graphics' technology is the first high fidelity lip-sync
solution driven by audio. Speech Graphics market a multi-lingual, scalable
solution to audio-driven animation that uses acoustic analysis and muscle
dynamics to drive the faces of computer game characters accurately
matching the words and emotion in the audio. The industry-leading
technology developed by Speech Graphics has been used to animate
characters in computer games developed by Supermassive games in 2012 and
in music videos for artists such as Kanye West in 2013.
This impact case study provides evidence of economic impacts of
our research because:
i) a spin-out company, Speech Graphics Ltd, has been created, established
its viability, and gained international recognition;
ii) the computer games industry and the music video industry have adopted
a new technology founded on University of Edinburgh research into a novel
technique to synthesize lip motion trajectories using Trajectory Hidden
Markov Models; and
iii) this led to the improvement of the process of cost-effective
creation of computer games which can be sold worldwide because their
dialogue can be more easily specialised into different human languages
with rapid creation of high-quality facial animation replacing a
combination of motion capture and manual animation.
Stroke and other forms of brain injury often result in debilitating
communication impairments. For example, patients with acquired apraxia of
speech (AOS) experience difficulties that affect their capacity to
verbally express thoughts and needs. Such individuals have benefitted from
the development of a novel computerised treatment — "Sheffield
Word" (SWORD). Patients who took part in clinical trials showed improvements
in aspects of speech that were impaired after stroke. SWORD is now
used by healthcare teams worldwide, providing benefits to a large patient
population. The SWORD computerised treatment is convenient to use at
home, fosters users' autonomy, and delivers higher treatment
doses than possible through traditional clinical sessions.
Clinicians who treat AOS have also benefitted through education, training
and access to online materials about SWORD which were provided by the
Edinburgh's research in multilingual speech synthesis has had clinical
and commercial impact, and has resulted in a large and diverse community
Clinical applications: Our research has enabled the construction
of natural-sounding, personalised synthetic voices from recordings of
speech from people with disordered speech due to conditions such as
Parkinson's disease or Motor Neurone Disease. These synthetic voices are
used in assistive technology devices that allow sufferers of these
conditions to communicate more easily and effectively.
Commercial take-up: Our research has achieved commercial impact
through the licensing of technology components, and through the activities
of start-up companies.
Community of users: The Festival Speech Synthesis System (v2.1
released in November 2010) is a complete open-source text-to-speech system
released under an unrestrictive X11-type license, and is distributed as
part of many major Linux distributions.
Articulate Instruments Ltd. was founded in 2003 as a research, design,
manufacturing and consulting company for users of phonetic
instrumentation. It invents, designs, markets and supports instrumental
technologies for normative and clinical speech science and for the
diagnosis and treatment of speech disorders. Products include electronic
systems, headsets, software, and methodologies, underpinned by QMU
research. Clinical use of relevant products as medical devices requires
"CE marking" to prove on-going safety and support, first achieved in
Impact relates primarily to the company's on-going financial
health and its non-academic customer base. In its first 10
years, turnover averaged ~£120k, with over 200 customers internationally,
of whom more than 50 were non-academic.
Our research on speech synthesis is embodied in software tools which we
make freely available.
This has led to widespread use and commercial success, including direct
companies and use by major corporations. This same research benefits
people who lose the
ability to speak and have to rely on computer-based communication aids.
Unlike existing aids,
which provide a small range of inappropriate voices which are often not
accepted by users, our
technology can uniquely create intelligible and normal-sounding
personalised voices from
recordings even of disordered speech, and so enable people to communicate
and retain personal
identity and dignity.
Our research is concerned with enabling access to mathematical literature
to users with visual impairments (i.e., blind or partially sighted users)
or print impairments (i.e., users with specific learning disabilities like
dyslexia or dysgraphia). Therefore the impact is primarily of a societal
nature: we enable visual and print impaired learners' access to scientific
and mathematical knowledge from which they were previously excluded,
thereby furthering an inclusive teaching and learning environment. With
the number of people with learning disabilities being over 1 million (http://www.learningdisabilities.org.uk/help-information/Learning-Disability-Statistics-/)
and the number of visually impaired people predicted to rise over 2
million by 2020 in the UK alone (http://www.rnib.org.uk/aboutus/Research/statistics/Pages/statistics.aspx),
the work is significant in providing equal opportunities to learners in
the STEM subjects.
Our research has led to work with Google Inc. to enhance mathematics
accessibility on the Web via the screen-reader ChromeVox. It enables the
full text to speech translation of mathematics on the Web for all
users of the Chrome browser and Android platforms and has been
included in ChromeVox since version 27, released 9/5/2013, which has
31,518 downloads from the Chrome store as of 11/10/2013.
Our work has also resulted in an assistive technology tool, called
MaxTract, for providing access to teaching and learning material. It has
been deployed within digital mathematics libraries to enhance
accessibility to online material. Through direct feedback we are aware of
a number of visually impaired users that have actively used our tool.
GSM and 3G mobile systems do not currently support end-to-end security in
the form of encryption for speech. Research at Surrey has created new
speech technology which allows complete end-to-end security via the mobile
speech channel. This worldwide first secure-from-eavesdropping mobile
phone system is available anywhere there is mobile coverage.
A Surrey spin out, MulSys Ltd., has licensed the technology to security
agencies and is now developing a mass market product.
The psycholinguistic framework for research and practice developed by
Stackhouse and Wells is now a key component of the majority of UK speech
and language therapy courses at undergraduate and postgraduate levels. In
addition to influencing the design and delivery of course curricula in the
UK, Europe, Australia, South Africa and USA, the framework is used in
continuing professional development for speech and language therapists
(SLTs), special needs teachers, and with parents. The resultant impact on
clinical and educational practice, the assessment of children and the
planning of therapy interventions can be seen across the spectrum of
persisting speech difficulties, including those related to dyspraxia,
dysarthria, dyslexia, cleft palate, Down Syndrome, stammering, specific
speech and language impairments.