The impact is primarily in Public Health. It mainly concerns the adoption
of and demand for a speech research technology, Electropalatography (EPG),
for clinical diagnosis and treatment of speech disorders. Our continuing
long-term and interdisciplinary research into EPG has increased our impact
in this census period from the previous RAE2008, during which time the UOA
had already been awarded a Queen's Anniversary Prize (2002) for working
towards the clinical application of speech science.
Financial Support from the charitable sector and the NHS for the training
of classroom assistants and SLTs in EPG therapy is highlighted, along with
user testimonials, unmet demand, and small-scale provision of the therapy.
Research in robust speech enhancement and audio-visual processing has led
to impact on a range of different fronts:
(i) Collaboration with CSR, a leading $1 billion consumer
electronics company, has shaped its R&D research agenda in speech
enhancement, has inspired ideas for new product improvements, and has
helped establish Belfast as an audio research centre of excellence within
(ii) Our technology has changed the strategic R&D direction of a
company delivering healthcare monitoring systems, with potential
for multi-million pound savings in NHS budgets.
(iii) Audio-visual speech processing research has led to a
proof-of-concept biometric system, Liopa: a novel,
robust and convenient person authentication and verification technology
exploiting lip and facial movements (www.liopa.co.uk). A start-up company
is in an advanced stage of being established to commercialise this
product. The product and commercialisation strategy was awarded First
Prize in the 2013 NISP Connect £25K entrepreneurship competition in the
Digital Media and Software category. The first commercial partner for
Liopa has been engaged.
(iv) A system-on-chip implementation of a version of our speech
recognition engine, which was developed through an EPSRC project, was
awarded first prize in the High Technology Award in the 2010 NISP £25K
Awards competition, and contributed to the founding of a spin-out company,
Analytics Engines (www.analyticsengines.com).
Edinburgh's research in multilingual speech synthesis has had clinical
and commercial impact, and has resulted in a large and diverse community
Clinical applications: Our research has enabled the construction
of natural-sounding, personalised synthetic voices from recordings of
speech from people with disordered speech due to conditions such as
Parkinson's disease or Motor Neurone Disease. These synthetic voices are
used in assistive technology devices that allow sufferers of these
conditions to communicate more easily and effectively.
Commercial take-up: Our research has achieved commercial impact
through the licensing of technology components, and through the activities
of start-up companies.
Community of users: The Festival Speech Synthesis System (v2.1
released in November 2010) is a complete open-source text-to-speech system
released under an unrestrictive X11-type license, and is distributed as
part of many major Linux distributions.
One of the world-leading systems for large-vocabulary Automatic Speech
Recognition (ASR) has
been developed by a team led from the University of Sheffield. This
system, which won the
international evaluation campaigns for rich speech transcription organised
by the US National
Institute for Standards and Technology (NIST) in 2007 and 2009, has led
directly to the creation of
one spin-out, been largely instrumental in the launch of a second, has had
significant impact on the
development and growth of three existing companies, and has made highly
available free for the first time to a broad range of individual and
organisational users, with
applications including language learning, speech-to-speech translation and
access to education for
those with reading and writing difficulties.
Stroke and other forms of brain injury often result in debilitating
communication impairments. For example, patients with acquired apraxia of
speech (AOS) experience difficulties that affect their capacity to
verbally express thoughts and needs. Such individuals have benefitted from
the development of a novel computerised treatment — "Sheffield
Word" (SWORD). Patients who took part in clinical trials showed improvements
in aspects of speech that were impaired after stroke. SWORD is now
used by healthcare teams worldwide, providing benefits to a large patient
population. The SWORD computerised treatment is convenient to use at
home, fosters users' autonomy, and delivers higher treatment
doses than possible through traditional clinical sessions.
Clinicians who treat AOS have also benefitted through education, training
and access to online materials about SWORD which were provided by the
Speech Graphics Ltd is a spinout company from the University of
Edinburgh, building on research into the animation of talking heads during
2006-2011. Speech Graphics' technology is the first high fidelity lip-sync
solution driven by audio. Speech Graphics market a multi-lingual, scalable
solution to audio-driven animation that uses acoustic analysis and muscle
dynamics to drive the faces of computer game characters accurately
matching the words and emotion in the audio. The industry-leading
technology developed by Speech Graphics has been used to animate
characters in computer games developed by Supermassive games in 2012 and
in music videos for artists such as Kanye West in 2013.
This impact case study provides evidence of economic impacts of
our research because:
i) a spin-out company, Speech Graphics Ltd, has been created, established
its viability, and gained international recognition;
ii) the computer games industry and the music video industry have adopted
a new technology founded on University of Edinburgh research into a novel
technique to synthesize lip motion trajectories using Trajectory Hidden
Markov Models; and
iii) this led to the improvement of the process of cost-effective
creation of computer games which can be sold worldwide because their
dialogue can be more easily specialised into different human languages
with rapid creation of high-quality facial animation replacing a
combination of motion capture and manual animation.
Our research on speech synthesis is embodied in software tools which we
make freely available.
This has led to widespread use and commercial success, including direct
companies and use by major corporations. This same research benefits
people who lose the
ability to speak and have to rely on computer-based communication aids.
Unlike existing aids,
which provide a small range of inappropriate voices which are often not
accepted by users, our
technology can uniquely create intelligible and normal-sounding
personalised voices from
recordings even of disordered speech, and so enable people to communicate
and retain personal
identity and dignity.
Nearly every large-vocabulary speech recognition system in current use
employs outputs from fundamental research carried out in the University of
Cambridge Department of Engineering (DoEng) on adaptation of Hidden Markov
Models (HMMs). One example of the commercial application of these outputs
is their use on the Microsoft Windows desktop for both the command and
control functions and the dictation functions. Approximately one billion
copies of Windows have been shipped since 2008. Other examples show the
outputs used in the automatic transcription of a wide range of types of
data. [text removed for publication]
GSM and 3G mobile systems do not currently support end-to-end security in
the form of encryption for speech. Research at Surrey has created new
speech technology which allows complete end-to-end security via the mobile
speech channel. This worldwide first secure-from-eavesdropping mobile
phone system is available anywhere there is mobile coverage.
A Surrey spin out, MulSys Ltd., has licensed the technology to security
agencies and is now developing a mass market product.
Forensic speaker comparison is the analysis of recorded speech with
evidential value in legal (usually criminal) cases. It is now routinely
undertaken in the UK (ca. 600 cases annually) and increasingly elsewhere.
It is vital that casework is underpinned by robust research, that reliable
methods are applied, and that evidential results are framed appropriately.
York is one of the world's largest research groups in forensic speech
science, and in those academic disciplines (phonetics, sociolinguistics,
sociophonetics) that provide the essential foundation for this applied
field. The impacts of York research are felt through (i) enhancing
understanding of variation in speech; (ii) applying research findings via
collaboration in casework and research with J P French Associates (JPFA),
one of the world's leading laboratories; (iii) providing doctoral research
supervision for JPFA staff and professional training for other experts;
(iv) providing expert evidence in legal cases in the UK and
internationally; and (v) improving policy on expert evidence in the UK.